Re: [R] Abundance data ordination in R
Milton Cezar Ribeiro yahoo.com.br> writes: > > Dear R-gurus > > I have a data.frame with abundance data for species and sites which looks > like: > mydf<-data.frame( > sp1=sample(0:10,5,replace=T), > sp2=sample(0:20,5,replace=T), > sp3=sample(0:4,5,replace=T), > sp4=sample(0:2,5,replace=T)) > rownames(mydf)<-paste("sites",1:5,sep="") > > I would like make an ordination analysis of these data and my worries is about the "zeros" (absence of > species) into the matrix. Up to I read (Gotelli - A primir of ecological statistics, 2004), when I have > abundance data I can´t compute Euclidian Distances because the zeros have the meaning of absence of the > species and not as zero counting. Gotelli suggests one make "principal coordinates analysis". I would > like to here from you what you think about and what is the best packages and functions to I compute my > distance matrices and do my ordination analysis. Can I considere zero as NA on my data.frame? Is there a > good PDF book available about Multivariate Analysis for abundance data available on the web? > > Other people already suggested what to do with these data and where to find pdf texts. I only comment on some points raised in this original question. Firstly, Euclidean distance is quite OK with zeros, or at least as good as any other normal dissimilarity index is with zeros. Euclidean distance on non-transformed data is poor for other reasons (it takes squared differences emphasizing abundance, and even when two sites have nothing in common, Euclidean distance varies with total abundances). Using Principal Co-ordinates analysis does not change this, since it also can be run with Euclidean distances. However, there are a many packages providing "better" dissimilarity indices or transformations that make Euclidean distances more useful (such as the Hellinger transformation). Another question is more abstract: indeed, you may regard most zeros as missing data. Species probably could occur in your sample site, more or less, but it was too scarce to be observed. How to do this in practice is the tricky issue. You cannot simply change zeros to NA, since then the dissimilarities (if they don't fail) will really give a special significance to these cells. Regarding them as zeros certaily makes more sense than removing *pairs* of data where species is NA in one site and present in another. There are ways to have something like handling zeros as missing values of various degrees(!), but my decency prohibits me to write about these methods. cheers, jari oksanen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Abundance data ordination in R
Gavin Simpson wrote: > On Sun, 2007-04-01 at 09:20 -0700, Milton Cezar Ribeiro wrote: >> Dear R-gurus >> >> I have a data.frame with abundance data for species and sites which looks >> like: >> mydf<-data.frame( >> sp1=sample(0:10,5,replace=T), >> sp2=sample(0:20,5,replace=T), >> sp3=sample(0:4,5,replace=T), >> sp4=sample(0:2,5,replace=T)) >> rownames(mydf)<-paste("sites",1:5,sep="") >> >> I would like make an ordination analysis of these data and my worries >> is about the "zeros" (absence of species) into the matrix. Up to I >> read (Gotelli - A primir of ecological statistics, 2004), when I have >> abundance data I cant compute Euclidian Distances because the zeros >> have the meaning of absence of the species and not as zero counting. >> Gotelli suggests one make "principal coordinates analysis". I would >> like to here from you what you think about and what is the best >> packages and functions to I compute my distance matrices and do my >> ordination analysis. Can I considere zero as NA on my data.frame? Is >> there a good PDF book available about Multivariate Analysis for >> abundance data available on the web? > > In addition to the other suggestions, there is a Task View on CRAN for > the topic of Environmetrics. This has a section describing the various > ordination techniques available in R as well as functions to calculate > distance/dissimilarity matrices: > > http://cran.r-project.org/src/contrib/Views/Environmetrics.html > > G ... And here are a couple of other suggestions: 1) Use a distance that does not take couples of zero as information. Typically, the Bray-Curtis distance is one of the most commonly used in such a case. 2) Possibly transform your data first, depending on the relative importance you want to give to rare species (typically, a log, or double square root transformations increase importance of rare species relative to abundant ones). 3) One approach is to use MultiDimensional Scaling (see MASS package) on the distance matrix to make the ordination in two or three dimensions. See the Venables & Ripley's MASS book for details. 4) Another alternative is to use correspondence analysis, which uses the Chi2 distance and is adapted to abundances (it is designed to analyze contingency tables, but table of abundances, station versus species, could be considered as such a double entry contingency table in a way). Best, Philippe Grosjean >> Kind regards >> >> Miltinho >> Brazil >> >> __ >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Abundance data ordination in R
On Sun, 2007-04-01 at 09:20 -0700, Milton Cezar Ribeiro wrote: > Dear R-gurus > > I have a data.frame with abundance data for species and sites which looks > like: > mydf<-data.frame( > sp1=sample(0:10,5,replace=T), > sp2=sample(0:20,5,replace=T), > sp3=sample(0:4,5,replace=T), > sp4=sample(0:2,5,replace=T)) > rownames(mydf)<-paste("sites",1:5,sep="") > > I would like make an ordination analysis of these data and my worries > is about the "zeros" (absence of species) into the matrix. Up to I > read (Gotelli - A primir of ecological statistics, 2004), when I have > abundance data I cant compute Euclidian Distances because the zeros > have the meaning of absence of the species and not as zero counting. > Gotelli suggests one make "principal coordinates analysis". I would > like to here from you what you think about and what is the best > packages and functions to I compute my distance matrices and do my > ordination analysis. Can I considere zero as NA on my data.frame? Is > there a good PDF book available about Multivariate Analysis for > abundance data available on the web? In addition to the other suggestions, there is a Task View on CRAN for the topic of Environmetrics. This has a section describing the various ordination techniques available in R as well as functions to calculate distance/dissimilarity matrices: http://cran.r-project.org/src/contrib/Views/Environmetrics.html G > > Kind regards > > Miltinho > Brazil > > __ > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC [f] +44 (0)20 7679 0565 UCL Department of Geography Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street London, UK[w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT [w] http://www.freshwaters.org.uk/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Abundance data ordination in R
There are many ways to do this, really. For example if you use constrained (~ canonical) correspondence analysis the distance measure between sites is Chi-square and absences are not informative to the analysis. Or you can use an ecological distance measure (similarity indices like Soerensen, Bray-Curtis, Jaccard, and others) and perform principal coordinates (=multidimensional scaling), etc. Read the documentation and tutorials for the packages vegan, ade4 and labdsv. You might start your search at the page of Jari Oksanen: http://cc.oulu.fi/~jarioksa/softhelp/vegan.html or the one from Dave Roberts http://ecology.msu.montana.edu/labdsv/R/ . The vegan tutorial was useful for me to learn to use vegan: http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf If you need more indeep mathemathical details, you should take a look at Daniel Chessels site: http://pbil.univ-lyon1.fr/R/perso/pagechessel.html There are plenty of pdfs available for download (however, some are suited for beginners, others require more background knowledge) . Be warned: there is a large variety of techniques for multivariate analysis with different properties and weaknesses, sometimes the most popular analysis are not the most appropriate. Be sure of what you want and what you are doing before you perform the analysis, the interpretation will depend on the techniques applied. I personally find ade4 implements many different techniques but is poorly documented and some functionalities are somehow "hidden", while vegan provides more information about the functions and is perfect for getting started. I haven't used labdsv yet. hope this help JR El dom, 01-04-2007 a las 09:20 -0700, Milton Cezar Ribeiro escribió: > Dear R-gurus > > I have a data.frame with abundance data for species and sites which looks > like: > mydf<-data.frame( > sp1=sample(0:10,5,replace=T), > sp2=sample(0:20,5,replace=T), > sp3=sample(0:4,5,replace=T), > sp4=sample(0:2,5,replace=T)) > rownames(mydf)<-paste("sites",1:5,sep="") > > I would like make an ordination analysis of these data and my worries is > about the "zeros" (absence of species) into the matrix. Up to I read (Gotelli > - A primir of ecological statistics, 2004), when I have abundance data I cant > compute Euclidian Distances because the zeros have the meaning of absence of > the species and not as zero counting. Gotelli suggests one make "principal > coordinates analysis". I would like to here from you what you think about and > what is the best packages and functions to I compute my distance matrices and > do my ordination analysis. Can I considere zero as NA on my data.frame? Is > there a good PDF book available about Multivariate Analysis for abundance > data available on the web? > > Kind regards > > Miltinho > Brazil > > __ > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Abundance data ordination in R
Hi, There's a very good ordination web page by Mike Palmer aimed at ecologists (and since you have a species x site matrix, I'm assuming that describes you) at http://ordination.okstate.edu/ My recommendation is generally nonmetric multidimensional scaling (principal coordinates analysis is a metric scaling ordination), with a dissimilarity metric that doesn't consider joint absences, for example Bray-Curtis/Sorensen. Treating absent species as missing data is not a good idea, because while it may not be possible to say that they are truly missing from that site (depending on taxa and sampling methods), you at least know they aren't common at that site. Ecological data are messy enough without discarding information! There are several R packages that may be helpful, including ecodist and vegan. Sarah On 4/1/07, Milton Cezar Ribeiro <[EMAIL PROTECTED]> wrote: > Dear R-gurus > > I have a data.frame with abundance data for species and sites which looks > like: > mydf<-data.frame( > sp1=sample(0:10,5,replace=T), > sp2=sample(0:20,5,replace=T), > sp3=sample(0:4,5,replace=T), > sp4=sample(0:2,5,replace=T)) > rownames(mydf)<-paste("sites",1:5,sep="") > > I would like make an ordination analysis of these data and my worries is > about the "zeros" (absence of species) into the matrix. Up to I read (Gotelli > - A primir of ecological statistics, 2004), when I have abundance data I > can´t compute Euclidian Distances because the zeros have the meaning of > absence of the species and not as zero counting. Gotelli suggests one make > "principal coordinates analysis". I would like to here from you what you > think about and what is the best packages and functions to I compute my > distance matrices and do my ordination analysis. Can I considere zero as NA > on my data.frame? Is there a good PDF book available about Multivariate > Analysis for abundance data available on the web? > > Kind regards > > Miltinho > Brazil > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Abundance data ordination in R
Dear R-gurus I have a data.frame with abundance data for species and sites which looks like: mydf<-data.frame( sp1=sample(0:10,5,replace=T), sp2=sample(0:20,5,replace=T), sp3=sample(0:4,5,replace=T), sp4=sample(0:2,5,replace=T)) rownames(mydf)<-paste("sites",1:5,sep="") I would like make an ordination analysis of these data and my worries is about the "zeros" (absence of species) into the matrix. Up to I read (Gotelli - A primir of ecological statistics, 2004), when I have abundance data I can´t compute Euclidian Distances because the zeros have the meaning of absence of the species and not as zero counting. Gotelli suggests one make "principal coordinates analysis". I would like to here from you what you think about and what is the best packages and functions to I compute my distance matrices and do my ordination analysis. Can I considere zero as NA on my data.frame? Is there a good PDF book available about Multivariate Analysis for abundance data available on the web? Kind regards Miltinho Brazil __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.