[R-sig-eco] Follow-up to Vegan metaMDS: unusual first run stress values with large data set

2016-05-02 Thread Ewan Isherwood
Hi r-sig-ecology!

This is mostly a message for Jari Oksanen or another Vegan developer that
may be working specifically with metaMDS, but I'm opening up to anyone that
has any interest in this. First of all, you can see my original post here:
http://r-sig-ecology.471788.n2.nabble.com/Vegan-metaMDS-unusual-first-run-stress-values-with-large-data-set-td7577720.html

Basically, I'm having the same issues with the metaMDS engine as above (R
3.2.5, Vegan 2.3-5). This time my dataset is larger at 9239 sites x 85
species.

I've tried adjusting the sfgrmin value up to an absurd 1e-10,000,000
(decreasing this value to -7 resolved the issue last time)

I've tried upping the sratmax to about 0.99 recurring with 77 9's (I don't
think this should have an effect since it's concerned with the iterations
stopping when the stress ratio between two iterations goes above the
inputted value)

I've tried using the Jaccard and Bray methods (I don't think this should
have an effect)

I've trialled 3-6 dimensions randomly (this has in the past affected the
result, but that might be because of other factors)

I have always used the noshare = TRUE option otherwise it ejects some of
the sampling points with rare species to astronomical values on one or more
axes

I've tried iterations of this about 20-30 times but it still won't ever
give me a best solution that isn't the first run. Here is the basic code:

metaMDS(PSU.sp, k= x, distance = "jaccard", sfgrmin = x, sratmax = x,
noshare = TRUE)

I'm happy to privately share the raw dataset with Jari Oksanen if he's
interested in this phenomenon, but I would have to seek permission for
anyone else since I do not own it. In the meantime I will investigate other
methods to analyse this data, which shouldn't be an issue. Since my dataset
is unusually large for this method, this is probably more for curiosity's
sake for the Vegan developers.

Thanks for your help,

Ewan Isherwood

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Vegan metaMDS: unusual first run stress values with large data set

2012-12-05 Thread Ewan Isherwood
Hello, R-Community! This is the first time writing to this group and
indeed the first time using a mailing list, so please bear with me if
I’ve done something wrong.

I have a large species x site matrix (89 x 4831) that I want to
ordinate using metaMDS in the Vegan (2.0-5) package in R (2.15.2). If
I run this data frame using the Jaccard index in two or more
dimensions (k1), the first run (run=0) has a relatively low stress
value and the other 20 runs are much higher and have very low
deviation. However, k=1 seems to work fine. Furthermore, a
stress/scree plot reveals a pyramid-like shape, where the k=1 lowest
stress value is low, increases rapidly for k=2 then decreases slowly
as k increases.

Dimensions  Stress
1   0.1382185
2   0.1939509
3   0.1695375
4   0.155221
5   0.1406408
6   0.1294149

I’ve tried this with a small iteration of this data and this issue
arises at k2 rather than at k1 as it is here. Anyway, this is the
input and output:

library(vegan)
library(MASS)
PSU - read.table(PSU.txt, header = TRUE, sep = )
PSU.sp - PSU[, 22:110]
PSU.NMDS - metaMDS(PSU.sp, k=4, zerodist = add, distance = jaccard)

Square root transformation
Wisconsin double standardization
Zero dissimilarities changed into  0.0006657301
Run 0 stress 0.155221
Run 1 stress 0.2548103
Run 2 stress 0.255434
Run 3 stress 0.2551382
… (Up to run 20 where run 1 through run 20 have all very similar stress values.)

Call:
metaMDS(comm = PSU.sp, distance = jaccard, k = 4, zerodist = add)

global Multidimensional Scaling using monoMDS

Data: wisconsin(sqrt(PSU.sp))
Distance: jaccard

Dimensions: 4
Stress: 0.155221
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘wisconsin(sqrt(PSU.sp))’

Now, again, with k=1 this does not happen – the solution looks like
any other regular NMDS run. There are no blank values in the data as
they are all numbers between 0 and 100 corresponding to % cover, and
every row and column sum is greater than 0. There are many sites with
the same species configurations, hence the zerodist, but omitting this
makes no difference to the problem at hand. The NMDS works fine if I
use a subset of the data, but I have not subsetted and tested all of
it. Other metric (Euclidean) and nonmetric (Bray) dissimilarity
indices result in the same effect. I’ve chosen k=4 here because of the
(marginal) elbow in the stress plot, but the data itself actually
looks pretty good at any k value. Even though the output is
reasonable, I am concerned that hitting the best solution by a large
amount on the first run means something is messing up, and this
concern is amplified by the strange pyramid shaped stress plot.
Because metaMDS uses random starts, I don't see how this output is
possible. I've scoured the help files and archives of this list and I
am really now at a loss to explain this.

Thank you in advance for your time and consideration!

Ewan

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology