Re: [R-sig-eco] Package 'compositions': Interpreting dist() output

2014-10-14 Thread separent
Rich,




Distances can be interpreted as degrees of similarity. In your dataset, it 
means that the observation in 2004 is more similar to 2011 than 2013. You can 
visualize distances using distance-based clustering 
(http://www.statmethods.net/advstats/cluster.html) or multidimentional scaling 
(http://www.statmethods.net/advstats/mds.html).




--

Essi






De : Rich Shepard
Envoyé : ‎lundi‎, ‎13‎ ‎octobre‎ ‎2014 ‎17‎:‎34
À : r-sig-ecology@r-project.org





On Mon, 13 Oct 2014, Rich Shepard wrote:

  2004 2005 20062011  2012
 20050.5917687
 20060.70849411.1382195 20110.57968710.35033940.9175847 
 20121.36156700.80987641.76824540.9206943
 20131.49556971.20241231.67514631.01467111.2160550

   Spacing was not correct on the above; this should be better:

   2004 2005 20062011  2012
20050.5917687
20060.70849411.1382195
20110.57968710.35033940.9175847
20121.36156700.80987641.76824540.9206943
20131.49556971.20241231.67514631.01467111.2160550

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] CoDA: Clustering Multiple Data Sets

2014-10-10 Thread separent
Hi Rich,


It is not clear whether you need a supervised or an unsupervised model. 
Clustering is unsupervised: it will classify compositions in hierarchical 
groups regardless the label (countries, regions). If this is what you intend, 
you might compute the clustering (hclust) on an euclidean distance matrix 
(vegdist) performed across the clr- or ilr-transformed data (both return the 
same distances). If you mean a supervised approach, you might want to explain 
how groups differ, and/or predict to which group the composition belongs. To 
explain, discriminant analysis (packages MASS or ade4) is (arguably) often a 
good choice. To predict a category, you might look at machine learning 
techniques (see caret package among many others).



Regards,


Essi









De : Rich Shepard
Envoyé : ‎jeudi‎, ‎9‎ ‎octobre‎ ‎2014 ‎15‎:‎13
À : r-sig-ecology@r-project.org





   The documentation for packages compositions and robCompositions describe
distance measures and (in the former package) clustering. However, all the
examples, and the function syntax, apply to a single data set.

   This works well with geochemical and official statistical data when the
goal is to examine relationships among the components in the data set. I
find no examples for clustering multiple compositional data sets. For
example, if the expenditures (or expendituresEU) packages in robCompositions
included data from multiple countries and the analytical goal is to cluster
the countries based on each one's compositional data set. The package
AnimalVegetation in the compositions package compares [A]real compositions
by abundance of vegetation and animals for 50 plots in each of regions A and
B and appears to be similar to my data: macroinvertebrate compositions by
functional feeding groups and multiple (and variable number) of years in
each of 6 stream networks; each stream network is a separate data set. I
want to cluster the streams based on each data set. Unfortunately, I do not
see an example in package compositions that uses the AnimalVegetation data
for clustering.

   The hclust() function in the stats and compositions package (perhaps the
latter calls the function in the former package) appears to be limited to a
single data set.

   What package and function will allow me to calculate a distance matrix for
these 6 compositional data sets, then use those distances for hierarchical
clustering?

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Package 'compositions'; Function dnorm.acomp()

2014-09-30 Thread separent
Hi Rich,




Filzmoser et al. (2009) wrote that Some measures like the standard deviation 
(or the variance) make no statistical sense with closed data [...]. They also 
wrote that If Euclidean geometry is not valid, the arithmetic mean is quite 
likely to be a poor estimate of the data center.




See Filzmoser et al. (2009) 
http://www.statistik.tuwien.ac.at/forschung/SM/SM-2009-2.pdf




As Euclidean geometry is not valid for compositions, you have to compute the 
mean in the ilr or clr space (both are euclidean, alr is not). The mean.acomp 
function computes the mean in euclidean space, then back-transform the result 
in the compositional space.




library(compositions)

# Data
comp = matrix(c(0.0667, 0.0612061206120612, 0.0435, 0.044, 0.05, 
  0.0161, 0.6, 0.571457145714572, 0.6232, 0.5934, 0.4333, 0.629, 
  0.0667, 0.0612061206120612, 0.1014, 0.0659, 0.0667, 0.0323, 0.2444, 
  0.265326532653265, 0.2174, 0.2637, 0.3667, 0.2903, 0.0222, 
0.0408040804080408, 
  0.0145, 0.033, 0.0833, 0.0323), ncol=5)

# Mean
colMeans(unclass(comp)) ## biased mean
mean(comp) ## unbiased mean, calls mean.acomp under the hood




sbp = matrix(c( 1, 1, 1,-1,-1, ## A dummy sequential binary partition
1,-1,-1, 0, 0,
0, 1,-1, 0, 0,
0, 0, 0, 1,-1),
 ncol=5, byrow=TRUE)
psi = gsi.buildilrBase(t(sbp)) ## The orthonormal matrix
balances = ilr(comp, V=psi) ## computing the orthonormal balances
bal_mean = colMeans(balances) ## means of balances
ilrInv(bal_mean, V=psi) ## back-transform the mean in the compositional space




You see that the back-transformed mean is equal to mean.acomp(comp).




The total variance estimator is computed using eq. 10 in Filzmoser et al. 
(2009). This is what mvar does.




# Variance
sum1 = 0
for (i in 1:(ncol(comp)-1)) {
  sum2 = 0
  for (j in (i+1):ncol(comp)) {
sum2 = sum2 + var(log(comp[,i]/comp[,j]))
  }
  sum1 = sum1+sum2
}
tot_var = sum1/ncol(comp)
tot_var
mvar(comp)




The variance-covariance matrix of compositions should be computed in a 
log-ratio space. So var, sd, confidence intervals and p-values should be 
computed on your transformed data. Although confidence intervals on 
compositions are widely seen in the litterature, they can be misleading.




I prefer to compute the variance in the ilr space and put the confidence 
intervals in a CoDaDendrogram, then put only the means on compositions in a 
table below the dendrogram, as in Figure 5 of Parent et al. (2012) - I’ll send 
you the plot function if you want it, 
http://www.frontiersin.org/files/Articles/63683/fpls-04-00449-HTML/image_m/fpls-04-00449-g005.JPG




Regards,




Serge-Étienne








De : Rich Shepard
Envoyé : ‎mardi‎, ‎30‎ ‎septembre‎ ‎2014 ‎12‎:‎45
À : r-sig-ecology@r-project.org





   For a data set of count proportions, testing for fit to a multivariate
normal distribution is done with the function dnorm.acomp() in package
'compositions'. The function's calling parameters are the data set, mean,
and variance.

   Example data set:

dput(win.acomp)
structure(c(0.0667, 0.0612061206120612, 0.0435, 0.044, 0.05, 
0.0161, 0.6, 0.571457145714572, 0.6232, 0.5934, 0.4333, 0.629, 
0.0667, 0.0612061206120612, 0.1014, 0.0659, 0.0667, 0.0323, 0.2444, 
0.265326532653265, 0.2174, 0.2637, 0.3667, 0.2903, 0.0222, 0.0408040804080408, 
0.0145, 0.033, 0.0833, 0.0323), .Dim = c(6L, 5L), .Dimnames = list(
 NULL, c(Filterer, Gatherer, Grazer, Predator, Shredder
 )), class = acomp)

   The mean() function returns the mean value for each column:

 mean(win.acomp)
   Filterer   Gatherer Grazer   Predator   Shredder
0.04386630 0.58270151 0.06366245 0.27664502 0.03312472

and the multivariate function, mvar(), returns a single value:

 mvar(win.acomp)
[1] 0.6309852

   The dnorm.acomp() syntax, according to ?dnorm.acomp has a single value for
the mean:

dnorm.acomp(x,mean,var)

which raises the question of which mean value do I use for a data set?

TIA,

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Measurement distance for proportion data

2014-05-13 Thread separent
I would also suggest to give a try to the Aitchison distance. To do so, you can 
use the “compositions” package. You transform the proportions to centered 
log-ratios or isometric log-ratios (clr and ilr functions, respectively), then 
compute the Euclidean distance through transformed data - both transformations 
should return the same distances.


library(compositions)
library(vegan)
data(AnimalVegetation)
region = factor(ifelse(AnimalVegetation[,5]==1, A, B)) # region label
comp = acomp(AnimalVegetation[,1:4]) # proportions closed between 0 and 1
# comp[region==A,] = acomp(comp[region==A,]) + c(1,1,2,1) # perturbation on 
region A for testing purposes
bal = ilr(comp) # isometric log-ratios

dist = vegdist(bal, method=euclidean) # Aitchison dissimilarity matrix
mod = betadisper(dist, region)
mod
plot(mod)
adonis(dist ~ region)


Cheers,


Essi Parent






De : Jari Oksanen
Envoyé : ‎mardi‎, ‎13‎ ‎mai‎ ‎2014 ‎11‎:‎21
À : Zbigniew Ziembik
Cc : r-sig-ecology@r-project.org





Typical dissimilarity indices are of form difference/adjustment, where the 
adjustment takes care of forcing the index to the range 0..1, and handles 
varying total abundances / richnesses. If you have proportional data, you may 
not need the adjustment at all, but you can just use any index. That is, it 
does not matter so awfully much what index you use, and for many practical 
purposes it does not matter if data are proportional. Actually, several indices 
may be equal to each with with proportional data. For instance, Manhattan, 
Bray-Curtis and Kulczynski indices are all identical. All you need to decide is 
which name you use for your index -- numbers do not change.

The analysis of proportional data usually covers very different classes of 
models than ANOSIM and friends. Dissimilarities are not usually involved in 
these models. One aspect in proportional data is that only M-1 of M variables 
really are independent. However, this really needs to be taken into account if 
M is low. I have no idea how is that in your case. 

Cheers, Jari Oksanen
On 13/05/2014, at 15:32 PM, Zbigniew Ziembik wrote:

 I am not sure, but it seems that your problem is related to
 compositional data analysis. You can probably use Aitchison distance to
 estimate separation between proportions.
 Take a (free) look at:
 http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf.
 http://dugi-doc.udg.edu/bitstream/10256/297/1/CoDa-book.pdf.
 
 or (commercial):
 Aitchison, J. 2003. The Statistical Analysis of Compositional Data. The
 Blackburn Press.
 
 Best regards,
 ZZ
 
 
 Dnia 2014-05-12, pon o godzinie 16:37 +, Javier Lenzi pisze:
 Dear all, 
 I'm doing data exploration on seabirds trophic ecology data and I am using 
 ANOSIM to evaluate possible differences in diet during breeding and 
 non-breeding seasons. As starting point I am using some classical indexes 
 such as %FO (relative frequency of occurrence), N (number of prey counted in 
 the pooled sample of pellets), %N (N as a percentage of the total number of 
 prey of all food types in the pooled sample), V (total volume of all prey in 
 the pooled sample), and IRI (index of relative importance). 
 I have a concern on which similarity meassurement should I use in ANOSIM for 
 those indexes that are proportions.. I am aware that for instance 
 Bray-Curtis is used for count data (e.g. N) and Jaccard is used for 
 presence-absence data (which I don't have), however I did not find a proper 
 distance measurement for proportion data. Please, could you help me to find 
 a proper distance measurement for these proportion data? 
 Thank you very much in advance. Regards,Javier Lenzi
  [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology