Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Stephen Sefick



On Fri 14 Dec 2012 06:51:56 AM CST, Gavin Simpson wrote:

On Fri, 2012-12-14 at 06:22 -0600, Stephen Sefick wrote:


a) Which ordination method would be better for my data : PCA knowing
that the represented inertia is 35.62% or NMDS with a stress value about
0.22?


My opinion is PCA on hellinger transformed relative proportions "means"
more than an NMDS


?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger
transform.


Gavin, maybe I have spoken beyond my knowledge.  My though was that a
PCA has a unique solution and is therefore "better" (as long as an
appropriate distance is used that deals with the double zero problem
effectively).  I am sure that this is too simple for the reality of the
situation.  I don't know what a k-D PCA is.  Would you mind explaining
or directing me to some reading material?


By k-D PCA I meant that in nMDS you need to state the dimensionality; in
metaMDS() we start the process from a Principal Coordinates of the data
(PCoA == PCA when Euclidean distances used). I meant that nMDS for say
2d solutions can optimise the configuration arising from the first two
PCA axes.

I don't see the unique solution of PCA as an implicit advantage of that
method. It has a unique solution because the possible solutions are
constrained by the approach; linear combinations of the variables which
best approximate the Euclidean distances between samples. NMDS
generalises this idea extensively into a problem of best preserving the
mapping of the dissimilarities. As such it can do a better job of
drawing the map but that comes at a price.

Again though; horses for courses.



Given that NMDS essentially subsumes PCA I'm not sure what you are
getting at.


I don't understand.  Would you mind explaining this?
many thanks,


I meant in the sense that PCA is special case of Principal Coordinates
and that nMDS generalises Principal coordinates.

I don't get the point of saying one method is "better" than any other.
Each has uses etc. I certainly don't think any one method "means" more
than the other.


Point taken.  As always, it depends on the question that you are trying 
to answer.  Thank you for the discussion and clarification.




G


Stephen



G


b) If NMDS is more adapted which one is the better? with Hellinger
normalization and Bray-Curtis distance, or with the normalization
recommended by Legendre and Legendre and Kulcynski distance ?


I sounds like the normalization you are referring to is relative
proportion which is si/sum(s); s is a vector of taxon at a site.


c) Is there other method to apply? I’m going to try co-inertia with
ade4 package




I am reading about co-inertia analysis now as it may be useful for some
of the things that I am planning on doing.  This method looks promising.

You are going to have to decide on what type of ordination to use with
COIA...

HTH,

Stephen


Thanks in advance.

Cheers.

Claire Della Vedova




[[alternative HTML version deleted]]



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so little 
or so large that all they really do for us is puff us up and make us feel like 
gods.  We are mammals, and have not exhausted the annoying little problems of 
being mammals.

   -K. Mullis

"A big computer, a complex algorithm and a long time does not equal science."

 -Robert Gentleman



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Stephen Sefick
**
Auburn University 
Biological Sciences  
331 Funchess Hall   
Auburn, Alabama
36849   
**
sas0...@auburn.edu  
http://www.auburn.edu/~sas0025 
**


Let's not spend our time and resources thinking about things that are 
so little or so large that 

Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Gavin Simpson
On Fri, 2012-12-14 at 06:22 -0600, Stephen Sefick wrote:

> >>> a) Which ordination method would be better for my data : PCA knowing
> >>> that the represented inertia is 35.62% or NMDS with a stress value about
> >>> 0.22?
> >>>
> >> My opinion is PCA on hellinger transformed relative proportions "means"
> >> more than an NMDS
> >
> > ?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger
> > transform.
> 
> Gavin, maybe I have spoken beyond my knowledge.  My though was that a 
> PCA has a unique solution and is therefore "better" (as long as an 
> appropriate distance is used that deals with the double zero problem 
> effectively).  I am sure that this is too simple for the reality of the 
> situation.  I don't know what a k-D PCA is.  Would you mind explaining 
> or directing me to some reading material?

By k-D PCA I meant that in nMDS you need to state the dimensionality; in
metaMDS() we start the process from a Principal Coordinates of the data
(PCoA == PCA when Euclidean distances used). I meant that nMDS for say
2d solutions can optimise the configuration arising from the first two
PCA axes.

I don't see the unique solution of PCA as an implicit advantage of that
method. It has a unique solution because the possible solutions are
constrained by the approach; linear combinations of the variables which
best approximate the Euclidean distances between samples. NMDS
generalises this idea extensively into a problem of best preserving the
mapping of the dissimilarities. As such it can do a better job of
drawing the map but that comes at a price.

Again though; horses for courses.

> >
> > Given that NMDS essentially subsumes PCA I'm not sure what you are
> > getting at.
> 
> I don't understand.  Would you mind explaining this?
> many thanks,

I meant in the sense that PCA is special case of Principal Coordinates
and that nMDS generalises Principal coordinates.

I don't get the point of saying one method is "better" than any other.
Each has uses etc. I certainly don't think any one method "means" more
than the other.

G

> Stephen
> 
> >
> > G
> >
> >>> b) If NMDS is more adapted which one is the better? with Hellinger
> >>> normalization and Bray-Curtis distance, or with the normalization
> >>> recommended by Legendre and Legendre and Kulcynski distance ?
> >>>
> >> I sounds like the normalization you are referring to is relative
> >> proportion which is si/sum(s); s is a vector of taxon at a site.
> >>
> >>> c) Is there other method to apply? I’m going to try co-inertia with
> >>> ade4 package
> >>>
> >>>
> >>>
> >> I am reading about co-inertia analysis now as it may be useful for some
> >> of the things that I am planning on doing.  This method looks promising.
> >>
> >> You are going to have to decide on what type of ordination to use with
> >> COIA...
> >>
> >> HTH,
> >>
> >> Stephen
> >>
> >>> Thanks in advance.
> >>>
> >>> Cheers.
> >>>
> >>> Claire Della Vedova
> >>>
> >>>
> >>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>>
> >>>
> >>> ___
> >>> R-sig-ecology mailing list
> >>> R-sig-ecology@r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >>> -- 
> >>> Stephen Sefick
> >>> **
> >>> Auburn University
> >>> Biological Sciences
> >>> 331 Funchess Hall
> >>> Auburn, Alabama
> >>> 36849
> >>> **
> >>> sas0...@auburn.edu
> >>> http://www.auburn.edu/~sas0025
> >>> **
> >>>
> >>> Let's not spend our time and resources thinking about things that are so 
> >>> little or so large that all they really do for us is puff us up and make 
> >>> us feel like gods.  We are mammals, and have not exhausted the annoying 
> >>> little problems of being mammals.
> >>>
> >>>   -K. Mullis
> >>>
> >>> "A big computer, a complex algorithm and a long time does not equal 
> >>> science."
> >>>
> >>> -Robert Gentleman
> >>>
> >>
> >> ___
> >> R-sig-ecology mailing list
> >> R-sig-ecology@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
> > ___
> > R-sig-ecology mailing list
> > R-sig-ecology@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Stephen Sefick

On Fri 14 Dec 2012 05:08:32 AM CST, Gavin Simpson wrote:

On Thu, 2012-12-13 at 14:03 -0600, Stephen Sefick wrote:


My aim was to study how the distribution of species is linked with
environmental data.

Firstly, I did a PCA (with vegan library), using a Hellinger
transformation,
with commands like this :

acp1<-rda(decostand(myDataSpec[,c(25:62)], "hellinger"))




Is the Hellinger transform done on relative proportions?


The transformation includes division by by the row sum and hence
conversion to proportions. As such it can be applied to count data or
relative abundance data; with the latter the division by row sum will
have no effect and then the transformation collapses to a simple square
root transformation of the proportional abundance data.

This is one of the reasons for the apparent contradictions over the
utility of the chord distance in ecological and palaeoecological
disciplines. In the latter we commonly use proportional data whilst
count abundances are common in the former. Directly applying the chord
distance to count abundances carries with it the baggage of the
Euclidean distance (squared differences emphasise the big things). But
chord distance applied to proportional data *is* the Hellinger distance
and hence palaeoecologists have found the chord distance a useful
dissimilarity coefficients in their field.




a) Which ordination method would be better for my data : PCA knowing
that the represented inertia is 35.62% or NMDS with a stress value about
0.22?


My opinion is PCA on hellinger transformed relative proportions "means"
more than an NMDS


?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger
transform.


Gavin, maybe I have spoken beyond my knowledge.  My though was that a 
PCA has a unique solution and is therefore "better" (as long as an 
appropriate distance is used that deals with the double zero problem 
effectively).  I am sure that this is too simple for the reality of the 
situation.  I don't know what a k-D PCA is.  Would you mind explaining 
or directing me to some reading material?




Given that NMDS essentially subsumes PCA I'm not sure what you are
getting at.


I don't understand.  Would you mind explaining this?
many thanks,

Stephen



G


b) If NMDS is more adapted which one is the better? with Hellinger
normalization and Bray-Curtis distance, or with the normalization
recommended by Legendre and Legendre and Kulcynski distance ?


I sounds like the normalization you are referring to is relative
proportion which is si/sum(s); s is a vector of taxon at a site.


c) Is there other method to apply? I’m going to try co-inertia with
ade4 package




I am reading about co-inertia analysis now as it may be useful for some
of the things that I am planning on doing.  This method looks promising.

You are going to have to decide on what type of ordination to use with
COIA...

HTH,

Stephen


Thanks in advance.

Cheers.

Claire Della Vedova




[[alternative HTML version deleted]]



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so little 
or so large that all they really do for us is puff us up and make us feel like 
gods.  We are mammals, and have not exhausted the annoying little problems of 
being mammals.

  -K. Mullis

"A big computer, a complex algorithm and a long time does not equal science."

-Robert Gentleman



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Stephen Sefick
**
Auburn University 
Biological Sciences  
331 Funchess Hall   
Auburn, Alabama
36849   
**
sas0...@auburn.edu  
http://www.auburn.edu/~sas0025 
**


Let's not spend our time and resources thinking about things that are 
so little or so large that all they really do for us is puff us up and 
make us feel like gods.  We are mammals, and have not exhausted the 
annoying little pro

Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Gavin Simpson
On Thu, 2012-12-13 at 14:03 -0600, Stephen Sefick wrote:

> > My aim was to study how the distribution of species is linked with
> > environmental data.
> >
> > Firstly, I did a PCA (with vegan library), using a Hellinger 
> > transformation,
> > with commands like this :
> >
> > acp1<-rda(decostand(myDataSpec[,c(25:62)], "hellinger"))
> >
> >
> 
> Is the Hellinger transform done on relative proportions?

The transformation includes division by by the row sum and hence
conversion to proportions. As such it can be applied to count data or
relative abundance data; with the latter the division by row sum will
have no effect and then the transformation collapses to a simple square
root transformation of the proportional abundance data.

This is one of the reasons for the apparent contradictions over the
utility of the chord distance in ecological and palaeoecological
disciplines. In the latter we commonly use proportional data whilst
count abundances are common in the former. Directly applying the chord
distance to count abundances carries with it the baggage of the
Euclidean distance (squared differences emphasise the big things). But
chord distance applied to proportional data *is* the Hellinger distance
and hence palaeoecologists have found the chord distance a useful
dissimilarity coefficients in their field.


> >
> > a) Which ordination method would be better for my data : PCA knowing
> > that the represented inertia is 35.62% or NMDS with a stress value about
> > 0.22?
> >
> My opinion is PCA on hellinger transformed relative proportions "means" 
> more than an NMDS

?? NMDS with Hellinger distances could optimise a k-D PCA with Hellinger
transform.

Given that NMDS essentially subsumes PCA I'm not sure what you are
getting at.

G

> > b) If NMDS is more adapted which one is the better? with Hellinger
> > normalization and Bray-Curtis distance, or with the normalization
> > recommended by Legendre and Legendre and Kulcynski distance ?
> >
> I sounds like the normalization you are referring to is relative 
> proportion which is si/sum(s); s is a vector of taxon at a site.
> 
> > c) Is there other method to apply? I’m going to try co-inertia with
> > ade4 package
> >
> >
> >
> I am reading about co-inertia analysis now as it may be useful for some 
> of the things that I am planning on doing.  This method looks promising.
> 
> You are going to have to decide on what type of ordination to use with 
> COIA...
> 
> HTH,
> 
> Stephen
> 
> > Thanks in advance.
> >
> > Cheers.
> >
> > Claire Della Vedova
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> >
> >
> > ___
> > R-sig-ecology mailing list
> > R-sig-ecology@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > -- 
> > Stephen Sefick
> > **
> > Auburn University
> > Biological Sciences
> > 331 Funchess Hall
> > Auburn, Alabama
> > 36849
> > **
> > sas0...@auburn.edu
> > http://www.auburn.edu/~sas0025
> > **
> >
> > Let's not spend our time and resources thinking about things that are so 
> > little or so large that all they really do for us is puff us up and make us 
> > feel like gods.  We are mammals, and have not exhausted the annoying little 
> > problems of being mammals.
> >
> >  -K. Mullis
> >
> > "A big computer, a complex algorithm and a long time does not equal 
> > science."
> >
> >-Robert Gentleman
> >
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Jari Oksanen
Claire, Here some small comments
On 13/12/2012, at 17:24 PM, claire della vedova wrote:

> Dear all,
> 
> 
> 
> a)  Which ordination method would be better for my data : PCA knowing
> that the represented inertia is 35.62% or NMDS  with a stress value about
> 0.22? 

These numbers cannot be used to say which of these methods is better. You need 
other criteria. Some people may have strong opinions on the choice here, but 
these opinions cannot be based on these numbers -- they are based on something 
else (I do have such an opinion, but I abstain from expressing my opinion).

> 
> b)  If NMDS is more adapted which one is the better? with Hellinger
> normalization and Bray-Curtis distance, or with the normalization
> recommended by Legendre and Legendre  and Kulcynski distance ?
> 
Hellinger transformation was suggested for Euclidean metric, and normally it is 
used in PCA/RDA (which are based on Euclidean metric although they do not 
explicitly calculate Euclidean distances). I haven't heard of any advantages of 
Hellinger transformation with Bray-Curtis dissimilarity. I suggest you don't 
use it with Bray-Curtis. I don't know if Kulczyński dissimilarity is any better 
than, say, Sørensen dissimilarity (and both seem to be difficult to spell), but 
certainly it belongs to the same group of usually well behaving dissimilarities 
as variants of Bray-Curtis or Jaccard.

> c)   Is there other method to apply? I’m going to try co-inertia with
> ade4 package
> 
> 
Certainly there is a high number of methods you can apply, but why? What you 
try to analyse? What are your questions?

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Jari Oksanen
Hello Folks,

Kruskal's "rule of thumb" really is a rule of thumb. That is, it is intended 
for a rough guideline. In that sense, there is no difference to Clarke's rules. 
However, I wouldn't judge usability simply by stress: solutions with very low 
stress can be useless and solutions with fairly high stress can be usable. In 
stress it is a question about many things, but a large portion of stress is 
similar as signal/noise ratios. The signal is more difficult to detect with 
high noise, but if you detect the signal, the amount of noise does not matter. 
I have quite often seen pretty usable solutions with stress around or above 
0.20 (20%), at least when using external explanatory variables. There are 
limits, though. If you trace single runs, you may see that random starting 
configurations start typically start with stress 0.4 (40%) or a bit higher. If 
you cannot improve from that, the solution probably is pretty useless (and 
metaMDS you will probably have no convergent solutions). However, instead of 
discarding the results, you may first start with stricter convergence criteria 
for monoMDS (if you use monoMDS). See its help pages (next version of vegan 
will have stricter limit for "scale factor of gradient", sfgrmin). 

There is also a limit for low stress. In fact, the current vegan warns of too 
low stress (Kruskal's "perfect" fit). This is usually a symptom of insufficient 
data (too many dimensions for too few points, dissimilarities found from too 
few variables).

In my opinion, ecologists are often too much obsessed with goodness of fit 
values. This is true in general, but also very manifest with multivariate 
method. I do think that if you, say, in PCA or RDA "explain" something like 
>50%, there is something suspect in your analysis. Typical reasons are 
insufficient data (too few rows or columns) or not really multivariate data. 
Sometimes there are some very dominant species (high variance) so that the 
analysis need not care but about a couple of species, and that is an easy task. 
If you transform your data so that high abundances are squashed down and 
variances equalized, or even made equal, the data become more multivariate (= 
all species count). Typically this means that lower proportion of variance is 
"explained", but often the results are more interpretable. This also happens 
when you change models: Unscaled PCA/RDA using variances "explains" much of the 
variance, scaled PCA/RDA using correlations "explains" much less, and CA/CCA 
studying deviations from expectations "explains" the least. Typically the 
usability and interpretability of the results improves as "explanatory power" 
decreases. The same also often holds for NMDS: Euclidean distances often give 
lower stress and pooorer results athn dissimilarities that treat all species 
more equally.

Not really R, but perhaps I'm forgiven (this time),

Cheers, Jari Oksanen  

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Alan Haynes [aghay...@gmail.com]
Sent: 14 December 2012 09:53
To: sas0...@auburn.edu
Cc: claire della vedova; r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] pca or nmds (with which normalization and distance ) 
for abundance data ?

Hi Claire,

Im not sure if it helps, but it might be interesting to hear other list
readers views on the subject, but McCune and Grace, the authors of PCOrd
and "Analysis of Ecological Communities" have a couple of rules of thumb
for NMDS stress.
They use Kruskal stress*100, while i believe monoMDS (and thus metaMDS)
uses simple Kruskal stress. (values in brackets below are thus the values
vegan could report)

"Kruskal's rules of thumb"
2.5 (or 0.025) = excellent
5 (0.05) = good
10 (0.1) = fair
20 (0.2) = poor

"Clarke's rules of thumb"
<5 (0.05) - excellent, cannot be misinterpreted, but incredibly rare in
practice
5-10 (0.05 - 0.1) - good no real risk of false inference
10-20 (0.1 - 0.2) - can be usable, but upper values could be misleading.
plot details should not be used
>20 (0.2) - plots likely to be dangerous to interpret. Stresses of >~35,
samples are more or less randomly placed with little regard for ranking.


Correspondingly, McCune and Grace would probably err on the side of caution
as 0.22 is getting into the poor fit, dangerous to interpret areas.

It would be interesting to hear other NMDS users views on this...what
stress do you consider too high, when does an ordination become
(essentially) useless etc.


HTH

Cheers,

Alan







--
Email: aghay...@gmail.com
Mobile: +41794385586
Skype: aghaynes



On 13 December 2012 21:03, Stephen Sefick  wrote:

>
>
> On Thu 13 Dec 2012 09:24:41 AM CST, claire della vedova wrote:
>
>>
>> Dear all,
>>
>> I’m a biostatistician working for a French institute involved in
>> environmental risk assessment, and I would need help to understand the
>> results I obtained from several ordinati