[R-sig-eco] Double zeros and distance measure confusions and thoughts

Cade, Brian Fri, 19 Apr 2013 12:35:05 -0700

Hello All:  This post was motivated by the earlier posts this week
regarding CCA/NMDS/RDA etc and dissimilarity measures.  I have often
thought that the usual thinking on double zeros for species
abundance/composition comparisons across sites has confused several issues
and seems driven by an unrealistic expectation that we can have a useful
dissimilarity/similarity measure to look across multiple species at
multiple sites that will simultaneously provide useful information on
absolute abundance, composition (relative abundance) and presence/absence.
 I developed the examples below several years ago to demonstrate to myself
that Euclidean distance can always be made to perform usefully as a
dissimilarity/similarity measure with double zero data and that the only
issue is whether you want a dissimilarity/similarity measure that is
sensitive to absolute abundance, relative abundance (composition), or
presence/absence. I believe it is unrealistic to think you can have a
dissimilarity/similarity measure that adequately compares two or all three
attributes simultaneously.  I post these examples just to see what others
think about this.


Example problem with interpretation of double 0's from Legendre and
Legendre 1998:278); 3 sites (rows) with 3 species (col); done first as
absolute abundances; then as relative abundances(composition); both using
Euclidean distances.

> x
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    1    0    0
[3,]    0    4    4
>
> dist(x)
         1        2
2 1.732051
3 4.242641 5.744563

Legendre and Legendre (1998) and others argue that site 1 should be closer
to site 3 than site 2 because sites 1 and 3 share two species (2 and 3),
whereas sites 1 and 2 share no species. Hidden in this statement about
"sharing" really is a compositional type interpretation, i.e., relative
abundance. So convert the species abundances to relative proportions
(compositions),and then using Euclidean distance provides distance measures
that indicate similarity between sites 1 and 3 and the same dissimilarity
between sites 1 and 2, and 2 and 3.

> xc
     [,1] [,2] [,3]
[1,]    0  0.5  0.5
[2,]    1  0.0  0.0
[3,]    0  0.5  0.5
> dist(xc)
         1        2
2 1.224745
3 0.000000 1.224745

Notice no change in distance measure (Euclidean) was required, just a
change from absolute to relative abundances.  I would easily argue that if
you were interested in absolute abundances that the Euclidean distance
measures done in the first computation above do correctly recognize that
site 1 and 2 are closer than sites 1 and 3, which are slightly closer than
sites 2 and 3. So I don't believe the use of Euclidean distances is
inappropriate for abundance data, just that if you want a compositional
type of interpretation (proportion of species shared) that you ought to
convert abundances to relative
proportions prior to using Euclidean distances.

Similar arguments could be made for the data below from McCune and
Grace(2002) box 6.2, where sites 1 and 2 and sites 1 and 3 differences were
compared with Euclidean versus city-block distance. City-block distance say
the differences between sites 1 and 2 and between 1 and 3 are the same
(12), whereas the Euclidean distances are 9.165 versus 6.0.  Note that
converting to relative abundances yields Euclidean distances of 0.574 and
0.267, which seems reasonable in a compositional type of interpretation.
 The city-block distance seems to be emphasizing a compositional
interpretation based on presence/absence, proportion of species with any
abundance>0.

> x2
     [,1] [,2] [,3] [,4]
[1,]    4    2    0    1
[2,]    5    1    1   10
[3,]    7    5    3    4
> dist(x2)
         1        2
2 9.165151
3 6.000000 7.745967

###Convert to relative proportions
>
x2c<-matrix(c(4/7,5/17,7/19,2/7,1/17,5/19,0,1/17,3/19,1/7,10/17,4/19),nrow=3,
ncol=4)
> x2c
          [,1]       [,2]       [,3]      [,4]
[1,] 0.5714286 0.28571429 0.00000000 0.1428571
[2,] 0.2941176 0.05882353 0.05882353 0.5882353
[3,] 0.3684211 0.26315789 0.15789474 0.2105263

> dist(x2c)
          1         2
2 0.5746326
3 0.2668908 0.4469370

So we can convert the species abundances to presence/absence (1/0) data
form and use Euclidean distances.  This now has an intepretation comparable
to the city-block distance, i.e., sites 1 and 2 and sites 1 and 3 differ by
the same amount and sites 2 and 3 don't differ.  So really no need to
change to a distance measure other than Euclidean, we just need to have the
species measures either in absolute abundances, relative abundances
(compositions), or presence/absence depending on the desired
interpretations.  Can different distance measures really be expected to
provide simultaneous interpretations in both absolute abundance, relative
abundance, and presence/absence measures?
I don't think so.

> x2d
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    1
[2,]    1    1    1    1
[3,]    1    1    1    1

> dist(x2d)
  1 2
2 1
3 1 0
>

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov <brian_c...@usgs.gov>
tel:  970 226-9326




>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

[R-sig-eco] Double zeros and distance measure confusions and thoughts

Reply via email to