[R-sig-eco] Package 'compositions'; Function dnorm.acomp()

2014-09-30 Thread Rich Shepard

  For a data set of count proportions, testing for fit to a multivariate
normal distribution is done with the function dnorm.acomp() in package
'compositions'. The function's calling parameters are the data set, mean,
and variance.

  Example data set:


dput(win.acomp)
structure(c(0.0667, 0.0612061206120612, 0.0435, 0.044, 0.05, 
0.0161, 0.6, 0.571457145714572, 0.6232, 0.5934, 0.4333, 0.629, 
0.0667, 0.0612061206120612, 0.1014, 0.0659, 0.0667, 0.0323, 0.2444, 
0.265326532653265, 0.2174, 0.2637, 0.3667, 0.2903, 0.0222, 0.0408040804080408, 
0.0145, 0.033, 0.0833, 0.0323), .Dim = c(6L, 5L), .Dimnames = list(

NULL, c(Filterer, Gatherer, Grazer, Predator, Shredder
)), class = acomp)

  The mean() function returns the mean value for each column:


mean(win.acomp)

  Filterer   Gatherer Grazer   Predator   Shredder
0.04386630 0.58270151 0.06366245 0.27664502 0.03312472

and the multivariate function, mvar(), returns a single value:


mvar(win.acomp)

[1] 0.6309852

  The dnorm.acomp() syntax, according to ?dnorm.acomp has a single value for
the mean:

dnorm.acomp(x,mean,var)

which raises the question of which mean value do I use for a data set?

TIA,

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Package 'compositions'; Function dnorm.acomp()

2014-09-30 Thread separent
Hi Rich,




Filzmoser et al. (2009) wrote that Some measures like the standard deviation 
(or the variance) make no statistical sense with closed data [...]. They also 
wrote that If Euclidean geometry is not valid, the arithmetic mean is quite 
likely to be a poor estimate of the data center.




See Filzmoser et al. (2009) 
http://www.statistik.tuwien.ac.at/forschung/SM/SM-2009-2.pdf




As Euclidean geometry is not valid for compositions, you have to compute the 
mean in the ilr or clr space (both are euclidean, alr is not). The mean.acomp 
function computes the mean in euclidean space, then back-transform the result 
in the compositional space.




library(compositions)

# Data
comp = matrix(c(0.0667, 0.0612061206120612, 0.0435, 0.044, 0.05, 
  0.0161, 0.6, 0.571457145714572, 0.6232, 0.5934, 0.4333, 0.629, 
  0.0667, 0.0612061206120612, 0.1014, 0.0659, 0.0667, 0.0323, 0.2444, 
  0.265326532653265, 0.2174, 0.2637, 0.3667, 0.2903, 0.0222, 
0.0408040804080408, 
  0.0145, 0.033, 0.0833, 0.0323), ncol=5)

# Mean
colMeans(unclass(comp)) ## biased mean
mean(comp) ## unbiased mean, calls mean.acomp under the hood




sbp = matrix(c( 1, 1, 1,-1,-1, ## A dummy sequential binary partition
1,-1,-1, 0, 0,
0, 1,-1, 0, 0,
0, 0, 0, 1,-1),
 ncol=5, byrow=TRUE)
psi = gsi.buildilrBase(t(sbp)) ## The orthonormal matrix
balances = ilr(comp, V=psi) ## computing the orthonormal balances
bal_mean = colMeans(balances) ## means of balances
ilrInv(bal_mean, V=psi) ## back-transform the mean in the compositional space




You see that the back-transformed mean is equal to mean.acomp(comp).




The total variance estimator is computed using eq. 10 in Filzmoser et al. 
(2009). This is what mvar does.




# Variance
sum1 = 0
for (i in 1:(ncol(comp)-1)) {
  sum2 = 0
  for (j in (i+1):ncol(comp)) {
sum2 = sum2 + var(log(comp[,i]/comp[,j]))
  }
  sum1 = sum1+sum2
}
tot_var = sum1/ncol(comp)
tot_var
mvar(comp)




The variance-covariance matrix of compositions should be computed in a 
log-ratio space. So var, sd, confidence intervals and p-values should be 
computed on your transformed data. Although confidence intervals on 
compositions are widely seen in the litterature, they can be misleading.




I prefer to compute the variance in the ilr space and put the confidence 
intervals in a CoDaDendrogram, then put only the means on compositions in a 
table below the dendrogram, as in Figure 5 of Parent et al. (2012) - I’ll send 
you the plot function if you want it, 
http://www.frontiersin.org/files/Articles/63683/fpls-04-00449-HTML/image_m/fpls-04-00449-g005.JPG




Regards,




Serge-Étienne








De : Rich Shepard
Envoyé : ‎mardi‎, ‎30‎ ‎septembre‎ ‎2014 ‎12‎:‎45
À : r-sig-ecology@r-project.org





   For a data set of count proportions, testing for fit to a multivariate
normal distribution is done with the function dnorm.acomp() in package
'compositions'. The function's calling parameters are the data set, mean,
and variance.

   Example data set:

dput(win.acomp)
structure(c(0.0667, 0.0612061206120612, 0.0435, 0.044, 0.05, 
0.0161, 0.6, 0.571457145714572, 0.6232, 0.5934, 0.4333, 0.629, 
0.0667, 0.0612061206120612, 0.1014, 0.0659, 0.0667, 0.0323, 0.2444, 
0.265326532653265, 0.2174, 0.2637, 0.3667, 0.2903, 0.0222, 0.0408040804080408, 
0.0145, 0.033, 0.0833, 0.0323), .Dim = c(6L, 5L), .Dimnames = list(
 NULL, c(Filterer, Gatherer, Grazer, Predator, Shredder
 )), class = acomp)

   The mean() function returns the mean value for each column:

 mean(win.acomp)
   Filterer   Gatherer Grazer   Predator   Shredder
0.04386630 0.58270151 0.06366245 0.27664502 0.03312472

and the multivariate function, mvar(), returns a single value:

 mvar(win.acomp)
[1] 0.6309852

   The dnorm.acomp() syntax, according to ?dnorm.acomp has a single value for
the mean:

dnorm.acomp(x,mean,var)

which raises the question of which mean value do I use for a data set?

TIA,

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Package 'compositions'; Function dnorm.acomp()

2014-09-30 Thread Rich Shepard

On Tue, 30 Sep 2014, separ...@yahoo.com wrote:


See Filzmoser et al. (2009)
http://www.statistik.tuwien.ac.at/forschung/SM/SM-2009-2.pdf


Serge-Étienne,

  I'll carefully read this; have not found it in my literature surches.


Filzmoser et al. (2009) wrote that Some measures like the standard
deviation (or the variance) make no statistical sense with closed data
[...].


  It is also difficult to make ecological sense from these measures,
particularly when communicating with non-technical decision-makers.


They also wrote that If Euclidean geometry is not valid, the arithmetic
mean is quite likely to be a poor estimate of the data center. As
Euclidean geometry is not valid for compositions, you have to compute the
mean in the ilr or clr space (both are euclidean, alr is not). The
mean.acomp function computes the mean in euclidean space, then
back-transform the result in the compositional space.


  My developing understanding of log ratio transformations is that they only
apply to data such as chemical concentrations. The transformation serves to
close the data so the row totals are 1.0 or 100. With count data, the 
proportions
represent the column count as a proportion of the row total so converstion
to the compositional space and closure is not necessary as they already sum
to 1.0. Perhaps the Filzmoser et al. paper will clear up and correct my
understanding of how count data should be handled.

  What I need to do is calculate a valid measure of the variability of these
count data in each data set, compare them (perhaps by clustering), identify
explanatory geomorphic, hydrologic, and/or chemical data via regression
analysis, and examine temporal changes within each data set.

  I'm certainly open to suggestions and advice as CoDA is new to me.

Thanks again,

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology