Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
On Fri 14 Dec 2012 06:51:56 AM CST, Gavin Simpson wrote: On Fri, 2012-12-14 at 06:22 -0600, Stephen Sefick wrote: a) Which ordination method would be better for my data : PCA knowing that the represented inertia is 35.62% or NMDS with a stress value about 0.22? My opinion is PCA on hellinger transformed relative proportions "means" more than an NMDS ?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger transform. Gavin, maybe I have spoken beyond my knowledge. My though was that a PCA has a unique solution and is therefore "better" (as long as an appropriate distance is used that deals with the double zero problem effectively). I am sure that this is too simple for the reality of the situation. I don't know what a k-D PCA is. Would you mind explaining or directing me to some reading material? By k-D PCA I meant that in nMDS you need to state the dimensionality; in metaMDS() we start the process from a Principal Coordinates of the data (PCoA == PCA when Euclidean distances used). I meant that nMDS for say 2d solutions can optimise the configuration arising from the first two PCA axes. I don't see the unique solution of PCA as an implicit advantage of that method. It has a unique solution because the possible solutions are constrained by the approach; linear combinations of the variables which best approximate the Euclidean distances between samples. NMDS generalises this idea extensively into a problem of best preserving the mapping of the dissimilarities. As such it can do a better job of drawing the map but that comes at a price. Again though; horses for courses. Given that NMDS essentially subsumes PCA I'm not sure what you are getting at. I don't understand. Would you mind explaining this? many thanks, I meant in the sense that PCA is special case of Principal Coordinates and that nMDS generalises Principal coordinates. I don't get the point of saying one method is "better" than any other. Each has uses etc. I certainly don't think any one method "means" more than the other. Point taken. As always, it depends on the question that you are trying to answer. Thank you for the discussion and clarification. G Stephen G b) If NMDS is more adapted which one is the better? with Hellinger normalization and Bray-Curtis distance, or with the normalization recommended by Legendre and Legendre and Kulcynski distance ? I sounds like the normalization you are referring to is relative proportion which is si/sum(s); s is a vector of taxon at a site. c) Is there other method to apply? I’m going to try co-inertia with ade4 package I am reading about co-inertia analysis now as it may be useful for some of the things that I am planning on doing. This method looks promising. You are going to have to decide on what type of ordination to use with COIA... HTH, Stephen Thanks in advance. Cheers. Claire Della Vedova [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis "A big computer, a complex algorithm and a long time does not equal science." -Robert Gentleman ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that
Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
On Fri, 2012-12-14 at 06:22 -0600, Stephen Sefick wrote: > >>> a) Which ordination method would be better for my data : PCA knowing > >>> that the represented inertia is 35.62% or NMDS with a stress value about > >>> 0.22? > >>> > >> My opinion is PCA on hellinger transformed relative proportions "means" > >> more than an NMDS > > > > ?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger > > transform. > > Gavin, maybe I have spoken beyond my knowledge. My though was that a > PCA has a unique solution and is therefore "better" (as long as an > appropriate distance is used that deals with the double zero problem > effectively). I am sure that this is too simple for the reality of the > situation. I don't know what a k-D PCA is. Would you mind explaining > or directing me to some reading material? By k-D PCA I meant that in nMDS you need to state the dimensionality; in metaMDS() we start the process from a Principal Coordinates of the data (PCoA == PCA when Euclidean distances used). I meant that nMDS for say 2d solutions can optimise the configuration arising from the first two PCA axes. I don't see the unique solution of PCA as an implicit advantage of that method. It has a unique solution because the possible solutions are constrained by the approach; linear combinations of the variables which best approximate the Euclidean distances between samples. NMDS generalises this idea extensively into a problem of best preserving the mapping of the dissimilarities. As such it can do a better job of drawing the map but that comes at a price. Again though; horses for courses. > > > > Given that NMDS essentially subsumes PCA I'm not sure what you are > > getting at. > > I don't understand. Would you mind explaining this? > many thanks, I meant in the sense that PCA is special case of Principal Coordinates and that nMDS generalises Principal coordinates. I don't get the point of saying one method is "better" than any other. Each has uses etc. I certainly don't think any one method "means" more than the other. G > Stephen > > > > > G > > > >>> b) If NMDS is more adapted which one is the better? with Hellinger > >>> normalization and Bray-Curtis distance, or with the normalization > >>> recommended by Legendre and Legendre and Kulcynski distance ? > >>> > >> I sounds like the normalization you are referring to is relative > >> proportion which is si/sum(s); s is a vector of taxon at a site. > >> > >>> c) Is there other method to apply? I’m going to try co-inertia with > >>> ade4 package > >>> > >>> > >>> > >> I am reading about co-inertia analysis now as it may be useful for some > >> of the things that I am planning on doing. This method looks promising. > >> > >> You are going to have to decide on what type of ordination to use with > >> COIA... > >> > >> HTH, > >> > >> Stephen > >> > >>> Thanks in advance. > >>> > >>> Cheers. > >>> > >>> Claire Della Vedova > >>> > >>> > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> > >>> > >>> ___ > >>> R-sig-ecology mailing list > >>> R-sig-ecology@r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > >>> -- > >>> Stephen Sefick > >>> ** > >>> Auburn University > >>> Biological Sciences > >>> 331 Funchess Hall > >>> Auburn, Alabama > >>> 36849 > >>> ** > >>> sas0...@auburn.edu > >>> http://www.auburn.edu/~sas0025 > >>> ** > >>> > >>> Let's not spend our time and resources thinking about things that are so > >>> little or so large that all they really do for us is puff us up and make > >>> us feel like gods. We are mammals, and have not exhausted the annoying > >>> little problems of being mammals. > >>> > >>> -K. Mullis > >>> > >>> "A big computer, a complex algorithm and a long time does not equal > >>> science." > >>> > >>> -Robert Gentleman > >>> > >> > >> ___ > >> R-sig-ecology mailing list > >> R-sig-ecology@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > > > ___ > > R-sig-ecology mailing list > > R-sig-ecology@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
On Fri 14 Dec 2012 05:08:32 AM CST, Gavin Simpson wrote: On Thu, 2012-12-13 at 14:03 -0600, Stephen Sefick wrote: My aim was to study how the distribution of species is linked with environmental data. Firstly, I did a PCA (with vegan library), using a Hellinger transformation, with commands like this : acp1<-rda(decostand(myDataSpec[,c(25:62)], "hellinger")) Is the Hellinger transform done on relative proportions? The transformation includes division by by the row sum and hence conversion to proportions. As such it can be applied to count data or relative abundance data; with the latter the division by row sum will have no effect and then the transformation collapses to a simple square root transformation of the proportional abundance data. This is one of the reasons for the apparent contradictions over the utility of the chord distance in ecological and palaeoecological disciplines. In the latter we commonly use proportional data whilst count abundances are common in the former. Directly applying the chord distance to count abundances carries with it the baggage of the Euclidean distance (squared differences emphasise the big things). But chord distance applied to proportional data *is* the Hellinger distance and hence palaeoecologists have found the chord distance a useful dissimilarity coefficients in their field. a) Which ordination method would be better for my data : PCA knowing that the represented inertia is 35.62% or NMDS with a stress value about 0.22? My opinion is PCA on hellinger transformed relative proportions "means" more than an NMDS ?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger transform. Gavin, maybe I have spoken beyond my knowledge. My though was that a PCA has a unique solution and is therefore "better" (as long as an appropriate distance is used that deals with the double zero problem effectively). I am sure that this is too simple for the reality of the situation. I don't know what a k-D PCA is. Would you mind explaining or directing me to some reading material? Given that NMDS essentially subsumes PCA I'm not sure what you are getting at. I don't understand. Would you mind explaining this? many thanks, Stephen G b) If NMDS is more adapted which one is the better? with Hellinger normalization and Bray-Curtis distance, or with the normalization recommended by Legendre and Legendre and Kulcynski distance ? I sounds like the normalization you are referring to is relative proportion which is si/sum(s); s is a vector of taxon at a site. c) Is there other method to apply? I’m going to try co-inertia with ade4 package I am reading about co-inertia analysis now as it may be useful for some of the things that I am planning on doing. This method looks promising. You are going to have to decide on what type of ordination to use with COIA... HTH, Stephen Thanks in advance. Cheers. Claire Della Vedova [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis "A big computer, a complex algorithm and a long time does not equal science." -Robert Gentleman ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Stephen Sefick ** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ** sas0...@auburn.edu http://www.auburn.edu/~sas0025 ** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little pro
Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
On Thu, 2012-12-13 at 14:03 -0600, Stephen Sefick wrote: > > My aim was to study how the distribution of species is linked with > > environmental data. > > > > Firstly, I did a PCA (with vegan library), using a Hellinger > > transformation, > > with commands like this : > > > > acp1<-rda(decostand(myDataSpec[,c(25:62)], "hellinger")) > > > > > > Is the Hellinger transform done on relative proportions? The transformation includes division by by the row sum and hence conversion to proportions. As such it can be applied to count data or relative abundance data; with the latter the division by row sum will have no effect and then the transformation collapses to a simple square root transformation of the proportional abundance data. This is one of the reasons for the apparent contradictions over the utility of the chord distance in ecological and palaeoecological disciplines. In the latter we commonly use proportional data whilst count abundances are common in the former. Directly applying the chord distance to count abundances carries with it the baggage of the Euclidean distance (squared differences emphasise the big things). But chord distance applied to proportional data *is* the Hellinger distance and hence palaeoecologists have found the chord distance a useful dissimilarity coefficients in their field. > > > > a) Which ordination method would be better for my data : PCA knowing > > that the represented inertia is 35.62% or NMDS with a stress value about > > 0.22? > > > My opinion is PCA on hellinger transformed relative proportions "means" > more than an NMDS ?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger transform. Given that NMDS essentially subsumes PCA I'm not sure what you are getting at. G > > b) If NMDS is more adapted which one is the better? with Hellinger > > normalization and Bray-Curtis distance, or with the normalization > > recommended by Legendre and Legendre and Kulcynski distance ? > > > I sounds like the normalization you are referring to is relative > proportion which is si/sum(s); s is a vector of taxon at a site. > > > c) Is there other method to apply? I’m going to try co-inertia with > > ade4 package > > > > > > > I am reading about co-inertia analysis now as it may be useful for some > of the things that I am planning on doing. This method looks promising. > > You are going to have to decide on what type of ordination to use with > COIA... > > HTH, > > Stephen > > > Thanks in advance. > > > > Cheers. > > > > Claire Della Vedova > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > ___ > > R-sig-ecology mailing list > > R-sig-ecology@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > -- > > Stephen Sefick > > ** > > Auburn University > > Biological Sciences > > 331 Funchess Hall > > Auburn, Alabama > > 36849 > > ** > > sas0...@auburn.edu > > http://www.auburn.edu/~sas0025 > > ** > > > > Let's not spend our time and resources thinking about things that are so > > little or so large that all they really do for us is puff us up and make us > > feel like gods. We are mammals, and have not exhausted the annoying little > > problems of being mammals. > > > > -K. Mullis > > > > "A big computer, a complex algorithm and a long time does not equal > > science." > > > >-Robert Gentleman > > > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
Claire, Here some small comments On 13/12/2012, at 17:24 PM, claire della vedova wrote: > Dear all, > > > > a) Which ordination method would be better for my data : PCA knowing > that the represented inertia is 35.62% or NMDS with a stress value about > 0.22? These numbers cannot be used to say which of these methods is better. You need other criteria. Some people may have strong opinions on the choice here, but these opinions cannot be based on these numbers -- they are based on something else (I do have such an opinion, but I abstain from expressing my opinion). > > b) If NMDS is more adapted which one is the better? with Hellinger > normalization and Bray-Curtis distance, or with the normalization > recommended by Legendre and Legendre and Kulcynski distance ? > Hellinger transformation was suggested for Euclidean metric, and normally it is used in PCA/RDA (which are based on Euclidean metric although they do not explicitly calculate Euclidean distances). I haven't heard of any advantages of Hellinger transformation with Bray-Curtis dissimilarity. I suggest you don't use it with Bray-Curtis. I don't know if Kulczyński dissimilarity is any better than, say, Sørensen dissimilarity (and both seem to be difficult to spell), but certainly it belongs to the same group of usually well behaving dissimilarities as variants of Bray-Curtis or Jaccard. > c) Is there other method to apply? I’m going to try co-inertia with > ade4 package > > Certainly there is a high number of methods you can apply, but why? What you try to analyse? What are your questions? Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
Hello Folks, Kruskal's "rule of thumb" really is a rule of thumb. That is, it is intended for a rough guideline. In that sense, there is no difference to Clarke's rules. However, I wouldn't judge usability simply by stress: solutions with very low stress can be useless and solutions with fairly high stress can be usable. In stress it is a question about many things, but a large portion of stress is similar as signal/noise ratios. The signal is more difficult to detect with high noise, but if you detect the signal, the amount of noise does not matter. I have quite often seen pretty usable solutions with stress around or above 0.20 (20%), at least when using external explanatory variables. There are limits, though. If you trace single runs, you may see that random starting configurations start typically start with stress 0.4 (40%) or a bit higher. If you cannot improve from that, the solution probably is pretty useless (and metaMDS you will probably have no convergent solutions). However, instead of discarding the results, you may first start with stricter convergence criteria for monoMDS (if you use monoMDS). See its help pages (next version of vegan will have stricter limit for "scale factor of gradient", sfgrmin). There is also a limit for low stress. In fact, the current vegan warns of too low stress (Kruskal's "perfect" fit). This is usually a symptom of insufficient data (too many dimensions for too few points, dissimilarities found from too few variables). In my opinion, ecologists are often too much obsessed with goodness of fit values. This is true in general, but also very manifest with multivariate method. I do think that if you, say, in PCA or RDA "explain" something like >50%, there is something suspect in your analysis. Typical reasons are insufficient data (too few rows or columns) or not really multivariate data. Sometimes there are some very dominant species (high variance) so that the analysis need not care but about a couple of species, and that is an easy task. If you transform your data so that high abundances are squashed down and variances equalized, or even made equal, the data become more multivariate (= all species count). Typically this means that lower proportion of variance is "explained", but often the results are more interpretable. This also happens when you change models: Unscaled PCA/RDA using variances "explains" much of the variance, scaled PCA/RDA using correlations "explains" much less, and CA/CCA studying deviations from expectations "explains" the least. Typically the usability and interpretability of the results improves as "explanatory power" decreases. The same also often holds for NMDS: Euclidean distances often give lower stress and pooorer results athn dissimilarities that treat all species more equally. Not really R, but perhaps I'm forgiven (this time), Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Alan Haynes [aghay...@gmail.com] Sent: 14 December 2012 09:53 To: sas0...@auburn.edu Cc: claire della vedova; r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ? Hi Claire, Im not sure if it helps, but it might be interesting to hear other list readers views on the subject, but McCune and Grace, the authors of PCOrd and "Analysis of Ecological Communities" have a couple of rules of thumb for NMDS stress. They use Kruskal stress*100, while i believe monoMDS (and thus metaMDS) uses simple Kruskal stress. (values in brackets below are thus the values vegan could report) "Kruskal's rules of thumb" 2.5 (or 0.025) = excellent 5 (0.05) = good 10 (0.1) = fair 20 (0.2) = poor "Clarke's rules of thumb" <5 (0.05) - excellent, cannot be misinterpreted, but incredibly rare in practice 5-10 (0.05 - 0.1) - good no real risk of false inference 10-20 (0.1 - 0.2) - can be usable, but upper values could be misleading. plot details should not be used >20 (0.2) - plots likely to be dangerous to interpret. Stresses of >~35, samples are more or less randomly placed with little regard for ranking. Correspondingly, McCune and Grace would probably err on the side of caution as 0.22 is getting into the poor fit, dangerous to interpret areas. It would be interesting to hear other NMDS users views on this...what stress do you consider too high, when does an ordination become (essentially) useless etc. HTH Cheers, Alan -- Email: aghay...@gmail.com Mobile: +41794385586 Skype: aghaynes On 13 December 2012 21:03, Stephen Sefick wrote: > > > On Thu 13 Dec 2012 09:24:41 AM CST, claire della vedova wrote: > >> >> Dear all, >> >> I’m a biostatistician working for a French institute involved in >> environmental risk assessment, and I would need help to understand the >> results I obtained from several ordinati