Re: [R-sig-eco] twinspan classification rules as narrative

2019-12-16 Thread Jari Oksanen
Howdy,

TWINSPAN is not in CRAN. It seems that you found it in github. 

TWINSPAN is an old method, and it seems that people are forgetting how it 
works. Here some narrative:

First, you have defined cut levels to transform your abundance data into binary 
indicator “pseudospecies”. You give these cut levels in your call. Each species 
is split by these cut levels into pseudospecies, and that cut level number is 
added to the name of species. In your example, the indicator pseudospecies at 
division 8 are actually Cladgray1 and Cladnigr1 where the added ‘1’ just means 
that the species just occurs, but can have any abundance value: there is no way 
of knowing its abundance except for the lower limit (>0). In division 1 you 
have, for instance, Cladmiti4 which means that the species occurs at least at 
the cutlevel 4: at least at quantity 5, but it can have any value above that 
limit.

Now to the narrative for the division. The rule for division 8 (that you 
mention in your post) is actually "+Cladnigr1 +Cladgray1 < 1”. So they both are 
at the lowest cut level 1 (present with any abundance), the ‘+’ sign means that 
they are both positive indicator values and you add +1 for every plot where 
they occur. Would the sign be ‘-‘, you would add -1 for each presence to give 
negative scores. Doing this for all species gives you the indicator score: if 
both species are present, your score is 2, if one is present, your scores is 1 
and if neither is present your score is 0. The condition is ‘< 1’ meaning that 
if neither is present (score 0), the condition is true and you go to final 
group 16, but if one or both are present (scores 1 or 2), the  condition is 
false and you continue to division 17. However, this is a tree, and this 
narrative rule only applies to division 8 and those 16 sampling units it 
contains: these are split by this rule. To get to this division with this rule 
you must have satisfied the previous rules leading to this branch. You may see 
the branch structure using plot(twb): the internal divisions are shown in 
squared on tree, and the final groups and their sizes as terminal leaves.

The classification rules give you only the lower limit of species, and 
depending on the indicator score threshold, even some of these indicators may 
be missing in plot. However, you can use function twintable to see the actual 
cutlevels for each species. These serve as a cover-class values, but do not 
give any more detail than the cutlevels you defined.

Cheers, Jari

> On 16 Dec 2019, at 16:44, Gonzalez-Mirelis, Genoveva 
>  wrote:
> 
> Dear all,
> 
> I am trying to understand the results from the twinspan function in the R 
> package that has been recently developed (also named twinspan).
> 
> Particularly, I would like to be able to derive the classification rules 
> (indicator species and abundance values, or rather ranges) for each terminal 
> group of the twinspan classification.
> 
> From this:
> 
> library(twinspan)
> data(ahti)
> twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75))
> summary(twb)
> 
> I understand that say, for group number 16 (the first terminal group 
> encountered) the indicator species were Cladnigr and Cladgray.
> 
> I also understand that the indicator score threshold tells me which path to 
> follow down the tree (left or right). But I struggle to understand just what 
> the indicator score means (1)? And whether it can be related to the original 
> abundance value for those two species at the three relevant sites, namely 
> Ster113, Ster097 and Ster098?
> 
> What would be a narrative way to describe this particular branch of the tree?
> 
> Many thanks in advance,
> 
> Genoveva
> 
> Genoveva Gonzalez Mirelis, Scientist
> Institute of Marine Research
> Nordnesgaten 50
> 5005 Bergen, Norway
> Phone number +47 55238510
> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] How to get the total variance explained from an envfit object?

2019-11-08 Thread Jari Oksanen
Dear Claire Elizabeth Couch,

First let me explain how envfit() works: For continuous environmental variables 
it actually uses ordination to predict the environmental variable. Under the 
hood (bonnet), it fits a first degree trend surface (plane in 2D) for 
environmental variable over the ordination scores, and the R2 is the proportion 
that surface explains of the variable. The arrow shown is the gradient of this 
fitted trend surface and shows the direction to which the variable changes most 
rapidly in a first degree linear model.

Clearly you cannot add these R2 values, because your environmental variables 
can be (and normally are) inter-correlated.

It seems that you want to work into another direction than envfit: Predict 
ordination scores by a set of environmental variables.

There are many ways of doing this in R, although we do not provide canned tool 
for this. You can actually do this even with multiple linear model with 
function lm() of R::stats. Here is how to do this with vegan::rda() function:

library(vegan)
data(varespec, varechem)
ord <- metaMDS(varespec, trace = FALSE)
fit <- rda(scores(ord) ~ ., data = varechem)

The basic output of this will show you R2:

Call: rda(formula = scores(ord) ~ N + P + K + Ca + Mg + S + Al + Fe +
Mn + Zn + Mo + Baresoil + Humdepth + pH, data = varechem)

  Inertia Proportion Rank
Total 0.139051.0
Constrained   0.113570.816812
Unconstrained 0.025470.183192
Inertia is variance

Here the “Proportion” for the Constrained component is the overall R2 = 
0.81681. If you want to see the adjusted R2, this is found with

 RsquareAdj(fit)
$r.squared
[1] 0.8168084

$adj.r.squared
[1] 0.5318436

However, you only get the overall R2, but not partial R2 values for single 
variables. You can use anova(fit, by = “margin”) to find the (lacking) marginal 
significances of unique effects of the variables, though.

The regression coefficients can be found with command coef() — and probably you 
want them normalized:

> coef(fit, norm=TRUE)
 RDA1RDA2
N-0.154511701  0.48579513
P 0.002463991 -0.13179802
...
pH0.701009027 -0.66724274

You can also simplify this model in the usual way, for instance with the 
function ordistep() that uses permutation test to drop variables one by one (or 
you can build up this model adding variables one by one if you start with an 
empty model with ordistep() or ordiR2step()):

ordistep(fit)# drop variables
m0 <- update(fit, . ~ 1) # m0 is an empty model
ordistep(m0, scope=fit)  # add variables to an empty model

(After these anova(…, by = “margin”) results also give significant effects.)

All this sounds a bit weird to me (or more than “a bit”), but it can be done. I 
guess there are some readers who get hiccups for using RDA on NMDS, but this 
can be done as the NMDS space is metric (the *transfer* function is non-metric 
from community dissimilarities to metric ordination space). It also sounds 
really odd to have an ordination scores as dependent data in RDA, but this 
exactly answers the problem you presented: predict ordination scores by a set 
of external variables.  After all, RDA is nothing but a linear regression for 
multivariate response data, and there is no need to think it as an ordination.

Cheers, Jari Oksanen

On 7 Nov 2019, at 22:02, Couch, Claire Elizabeth 
mailto:cou...@oregonstate.edu>> wrote:

I am analyzing some microbiome data by using unconstrained ordination (PCA
or NMDS) followed by environmental vector fitting with the envfit function
in the vegan package. The output of envfit includes an r2 value for each
vector or factor included in the envfit model, but I am interested in the
total amount of variation explained by all the vectors/factors, rather than
just stand-alone variables. I presume I cannot simply add up the R2 values
assigned to each environmental variable, because there may be overlap in
the microbiome variation that is "explained" by each environmental
variable. However, there does not seem to be any way of accessing the total
r2 value for the model.

Using an example dataset, this is what I have tried so far:

library(vegan)
library(MASS)

data(varespec, varechem)
library(MASS)
ord <- metaMDS(varespec)
fit <- envfit(ord, varechem, perm = 999)
fit

This shows r2 for each environmental variable, but how do I extract the r2
value for the entire model?

I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but
these all return NULL.

I would greatly appreciate any suggestions!

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org

Re: [R-sig-eco] Vegan Function anova.cca: No Significance Codes Legend in Output

2019-09-25 Thread Jari Oksanen
Dear Melissa Schindler,

It is neither a “glitch” in vegan  nor something wrong with your analysis: the 
legend for significance stars is only shown when there are stars. The showing 
of stars or the legend is not controlled by vegan, but it is a system-wide 
setting in CRAN. See help page of ?printCoefmat which explains the usage of 
stars.

Cheers, Jari Oksanen

> On 25 Sep 2019, at 20:51, melissa schindler  
> wrote:
> 
> Hello all,
> 
> I have run a redundancy analysis on Hellinger transformed species abundance
> data and environmental variables.
> 
> Once I ran the RDA I proceeded to run a permutation on the resulting rda
> overall and by "axis" and the outputs did not provide a legend of
> significance codes.
> 
> Even if none of the axes were significant am I correct in assuming the
> legend should still be listed in the results of the permutations?
> 
> Is this a glitch in Vegan 2.5-6 or could something be wrong with my
> analysis?
> 
> *The following is the output of the anova.cca*
> 
> anova.cca(rda, by="axis")
> 'nperm' >= set of all permutations: complete enumeration.
> Set of permutations < 'minperm'. Generating entire set.
> Permutation test for rda under reduced model
> Forward tests for axes
> Permutation: free
> Number of permutations: 119
> 
> Model: rda(formula = spe.hel ~ Conduct + DO + pH, data = Zscore_env_var)
> Df Variance   F Pr(>F)
> RDA1  1 0.186195 41.4450 0.2333
> RDA2  1 0.002051  0.4564 0.8833
> Residual  2 0.008985
> *Significance code legend should be listed here with asterisk's denoting
> the p-values*
> 
>> anova.cca(rda)
> 'nperm' >= set of all permutations: complete enumeration.
> Set of permutations < 'minperm'. Generating entire set.
> Permutation test for rda under reduced model
> Permutation: free
> Number of permutations: 119
> 
> Model: rda(formula = spe.hel ~ Conduct + DO + pH, data = Zscore_env_var)
> Df Variance  F Pr(>F)
> Model 3 0.188246 6.9836  0.225
> Residual  1 0.008985
> 
> *Significance code legend should be listed here with asterisk's denoting
> the p-values*
> 
> 
> Thanks for your time!
> 
> Melissa
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] transformation of Bray-Curtis in Euclidean

2019-05-09 Thread Jari Oksanen
Yes, it is possible, and always has been when I have checked (which is not a 
proof). You can check this by seeing that it has no negative eigenvalues in 
principal coordinates analysis (apart from occasional negative almost-zero). 
Legendre & Legendre book discuss this.

Cheers, Jari Oksanen

> On 9 May 2019, at 10:21, Irene Adamo  wrote:
> 
> Hi all,
> 
> I have a very simple question: is it possible that the square-root of
> Bray-Curtis values is Euclidean? if not, is there a way to transform
> bray-curtis which is semi-quantitative in Euclidean?
> 
> thanks a lot!
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] standard deviation error for EcoTest.sample

2019-01-11 Thread Jari Oksanen
No, you do not need to be worried about that warning. You have rarefaction up 
to the observed community, and that is always constant, hence you get the 
warning of zero standard deviation. That should be fixed, but I’m too lazy. 
Contributions welcome in vegan at https://github.com/vegandevs/vegan/.

cheers, Jari Oksanen

On 11 Jan 2019, at 20:09, Charlotte Reemts 
mailto:cree...@tnc.org>> wrote:

I am using EcoTest.sample (rareNMtests package) to compare rarefaction curves 
for 19 vegetation plots on two soil types (alluvial and canyon). The code below 
produces the following warning (more than 50 times): "In cor(x > 0) : the 
standard deviation is zero".
The test still produces all the expected output. I tried sorting the dataset by 
SiteType, but still got the error. Curiously, sorting the data produced a 
significant difference between the site types (p around 0.3), while there was 
no difference between the unsorted data (p around 0.09).
Should I be concerned about the warnings? Why does sorting the dataset make a 
difference?

Thanks,
Charlotte

rawdata<-read.table(text="Plot  SiteTypesp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 
sp10sp11sp12sp13sp14sp15sp16sp17sp18sp19
sp20sp21sp22sp23sp24sp25sp26sp27sp28sp29
sp30sp31sp32sp33sp34sp35
2   canyon  1   0   1   0   1   1   0   1   0   0   1   0   0   0   1   0   0   
0   0   0   1   0   0   0   0   0   0   0   0   0   1   0   1   0   0
3   alluvial1   0   0   0   0   1   1   1   0   0   0   0   0   0   1   0   
0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   0   1   0   0
5   alluvial1   0   0   0   0   0   0   1   1   0   0   0   0   1   1   0   
0   1   0   0   0   0   0   0   0   1   0   0   0   0   0   0   1   0   0
6   alluvial1   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   
0   0   1   0   1   1   0   0   0   1   0   0   0   0   0   0   1   0   0
7   alluvial1   0   0   1   1   0   0   0   0   0   0   0   0   1   0   0   
0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   1   0   0
8   alluvial1   0   1   0   0   0   0   0   0   1   0   0   0   0   1   0   
0   0   0   1   0   1   0   0   0   0   0   0   0   0   1   0   1   0   0
10  alluvial1   0   1   0   0   1   0   0   0   0   0   1   0   0   0   0   
0   0   1   0   0   1   0   0   0   0   0   0   1   0   1   1   1   0   0
11  canyon  1   1   0   0   0   1   0   0   0   0   0   0   0   1   1   0   0   
0   0   0   1   0   1   0   0   0   1   0   1   0   0   0   1   0   0
12  canyon  0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   
0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
13  canyon  1   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0   
0   0   0   1   0   0   1   0   0   0   0   0   1   0   0   0   0   0
14  canyon  1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
15  canyon  1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   
0   0   0   1   0   0   0   0   0   0   0   1   1   0   0   0   0   0
16  canyon  1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
17  canyon  1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
0   0   0   0   1   0   0   1   0   0   0   0   0   0   0   0   0   0
18  canyon  1   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   
0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   1   0   0
19  canyon  1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
0   0   0   0   1   0   1   0   0   0   0   0   0   0   0   0   1   0
20  canyon  1   0   0   0   0   1   0   0   0   0   0   0   0   1   1   0   0   
0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0   0   1
22  alluvial1   0   0   0   0   1   0   0   0   1   0   0   1   0   1   0   
0   0   0   0   0   1   0   1   1   0   0   1   0   1   0   0   1   0   0
23  alluvial1   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   
0   0   1   0   0   1   0   0   1   0   0   0   0   0   1   0   0   0   0
", header=T)

data<-rawdata[,-1]
rownames(data)<-rawdata[,1]

library(rareNMtests)
test.data<-EcoTest.sample(data[,-1], by=data$SiteType, MARGIN=1, trace=F)  
#error message and no significant difference

data2<- data[do.call(order, data),]
test.data2<-EcoTest.sample(data2[,-1], by=data2$SiteType, MARGIN=1, trace=F)  
#error message and significant difference



[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

_

Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?

2018-11-01 Thread Jari Oksanen
ordered models
(which are equivalent to Type II ss tests). And these results will tell you
pretty the same thing as type III tests if there is little or no
interaction. I would not worry about trying to estimate the main effects
while controlling for the interaction (Ellen’s question), which cannot be
done using type I or type II SS in 2-way permanova using adonis. But why
would you want to? The lack of a balanced design results in the main
effects and the interaction not being independent of one another.  Forcing
that independence by using type III ss can only work by essentially
"throwing away" some of the information associated with the main effects,
possibly resulting in an overly conservative test. The lower the
interaction, however, the less is thrown away and the less it matters.


Steve




Stephen Brewer
jbre...@olemiss.edu<mailto:jbre...@olemiss.edu>
Professor
Department of Biology
PO Box 1848
University of Mississippi
University, Mississippi 38677-1848
Brewer web page - 
https://jstephenbrewer.wordpress.com<https://jstephenbrewer.wordpress.com/>
FAX - 662-915-5144 Phone - 662-202-5877





On Oct 31, 2018, at 5:45 PM, Gian Maria Niccolò Benucci <
gian.benu...@gmail.com<mailto:gian.benu...@gmail.com>> wrote:

Thank you Jari,

So to test if there are significant interaction I should use
Stage:Growhouse
i.e. A:B. This will test the interaction and main effects that are marginal
and so removed. How matters then if I include by="margin" or not? The R2
are the same (please see below) but the p-value changes. I assume the
second way is most correct, is it?

*> adonis2(t(otu_fungi_out) ~ Stage : Growhouse, data=metadata_fungi_out,
permutations=)*
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 

adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data =
metadata_fungi_out, permutations = )
  Df SumOfSqs  R2  F Pr(>F)
Stage:Growhouse  3   1.0812 0.23075 1.9998 0.0211 *
Residual20   3.6045 0.76925
Total   23   4.6857 1.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


*> adonis2(t(otu_fungi_out) ~ Stage : Growhouse, data=metadata_fungi_out,
by = "margin", permutations=)*
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 

adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data =
metadata_fungi_out, permutations = , by = "margin")
  Df SumOfSqs  R2  F Pr(>F)
Stage:Growhouse  3   1.0812 0.23075 1.9998  0.006 **
Residual20   3.6045 0.76925
Total   23   4.6857 1.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Cheers,

Gian

On Tue, 30 Oct 2018 at 05:47, Jari Oksanen 
mailto:jari.oksa...@oulu.fi>> wrote:

Hello Gian,

These formulae expand into different models. Compare

model.matrix(~ Stage:Growhouse, data=metadata_fungi_out)
model.matrix(~ Stage*Growhouse, data=metadata_fungi_out)

The first model (Stage:Growhouse) will also contain (implicitly) main
effects and all these terms are marginal and can be removed, whereas the
latter Stage*Growhouse expands to explicit main effects and interaction
effects, and only the interaction effects are marginal and can be removed.
This is also reflected in the degrees of freedom in your anova table: In
the first case Stage:Growhouse has 3 df, and in the latter only 1 df (and
the main effects ignored had 2 df).

Ciao, Giari

On 29 Oct 2018, at 19:11, Gian Maria Niccolò Benucci <

gian.benu...@gmail.com<mailto:gian.benu...@gmail.com>> wrote:


Hello Jari,

It is a little bit confusing. If A*B unfolds in A+B+A:B then A:B is the
real interaction component.
So, which if the code below will test the variance for the interaction
alone?

adonis2(t(otu_fungi_out) ~ *Stage : Growhouse*, data=metadata_fungi_out,

by = "margin", permutations=)
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 

adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data =
metadata_fungi_out, permutations = , by = "margin")
 Df SumOfSqs  R2  F Pr(>F)
Stage:Growhouse  3   1.0812 0.23075 1.9998  1e-04 ***
Residual20   3.6045 0.76925
Total   23   4.6857 1.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


adonis2(t(otu_fungi_out) ~ *Stage * Growhouse*, data=metadata_fungi_out,

by = "margin", permutations=)
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 

adonis2(formula = t(otu_fungi_out) ~ Stage * Growhouse, data =
metadata_fungi_out, permutations = , by = "margin")
 Df SumOfSqs  R2  F Pr(>F)
Stage:Growhouse  1   0.2171 0.04633 1.2045 0.2443
Resid

Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?

2018-10-30 Thread Jari Oksanen
Hello Gian,

These formulae expand into different models. Compare

model.matrix(~ Stage:Growhouse, data=metadata_fungi_out)
model.matrix(~ Stage*Growhouse, data=metadata_fungi_out)

The first model (Stage:Growhouse) will also contain (implicitly) main effects 
and all these terms are marginal and can be removed, whereas the latter 
Stage*Growhouse expands to explicit main effects and interaction effects, and 
only the interaction effects are marginal and can be removed.  This is also 
reflected in the degrees of freedom in your anova table: In the first case 
Stage:Growhouse has 3 df, and in the latter only 1 df (and the main effects 
ignored had 2 df). 

Ciao, Giari

> On 29 Oct 2018, at 19:11, Gian Maria Niccolò Benucci  
> wrote:
> 
> Hello Jari,
> 
> It is a little bit confusing. If A*B unfolds in A+B+A:B then A:B is the
> real interaction component.
> So, which if the code below will test the variance for the interaction
> alone?
> 
>> adonis2(t(otu_fungi_out) ~ *Stage : Growhouse*, data=metadata_fungi_out,
> by = "margin", permutations=)
> Permutation test for adonis under reduced model
> Marginal effects of terms
> Permutation: free
> Number of permutations: 
> 
> adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data =
> metadata_fungi_out, permutations = , by = "margin")
>Df SumOfSqs  R2  F Pr(>F)
> Stage:Growhouse  3   1.0812 0.23075 1.9998  1e-04 ***
> Residual20   3.6045 0.76925
> Total   23   4.6857 1.0
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> 
>> adonis2(t(otu_fungi_out) ~ *Stage * Growhouse*, data=metadata_fungi_out,
> by = "margin", permutations=)
> Permutation test for adonis under reduced model
> Marginal effects of terms
> Permutation: free
> Number of permutations: 
> 
> adonis2(formula = t(otu_fungi_out) ~ Stage * Growhouse, data =
> metadata_fungi_out, permutations = , by = "margin")
>Df SumOfSqs  R2  F Pr(>F)
> Stage:Growhouse  1   0.2171 0.04633 1.2045 0.2443
> Residual20   3.6045 0.76925
> Total   23   4.6857 1.0
>> 
> 
> The results is clearly very different. Also, in a normal adonis call I
> didn't have any significance for the interaction that I have instead if I
> use A:B. So ~ A*B will not test for interactions at all?
> 
>> *adonis*(t(otu_fungi_out) ~ Stage * Growhouse, data=metadata_fungi_out,
> permutations=)
> Call:
> adonis(formula = t(otu_fungi_out) ~ Stage * Growhouse, data =
> metadata_fungi_out,  permutations = )
> 
> Permutation: free
> Number of permutations: 
> 
> Terms added sequentially (first to last)
> 
>Df SumsOfSqs MeanSqs F.Model  R2 Pr(>F)
> Stage10.4877 0.48769  2.7060 0.10408 0.0247 *
> Growhouse10.3765 0.37647  2.0889 0.08034 0.0542 .
> Stage:Growhouse  10.2171 0.21708  1.2045 0.04633 0.2507
> Residuals   20    3.6045 0.18023 0.76925
> Total   234.6857 1.0
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> 
> 
> Thank you!
> 
> Gian
> 
> 
> 
> 
> 
> On Tue, 16 Oct 2018 at 08:54, Jari Oksanen  wrote:
> 
>> 
>> 
>> On 16/10/18 11:23, Torsten Hauffe wrote:
>>>  "adonis2(speciesdataset~A*B, by="margin") but then only the effect of
>> the
>>> interaction is tested."
>>> 
>>> This is not entirely correct.
>>> adonis2(speciesdataset~A:B, by="margin") would test the interaction
>> alone.
>>> ~A*B unfolds to ~A+B+A:B
>> 
>> Well, it was correct: the only **marginal** effect in ~A+B+A:B is A:B (A
>> and B are not marginal), and by = "margin" will only analyse marginal
>> effects.
>> 
>> Cheers, Jari Oksanen
>>> 
>>> On Tue, 16 Oct 2018 at 11:51, Ellen Pape  wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I don't know whether this is the correct mailing group to address this
>>>> question:
>>>> 
>>>> I would like to perform a 2-way permanova analysis in R (using adonis in
>>>> vegan). By default you are performing sequential tests (by="terms"), so
>>>> when you have 2 or more factors, the order of these factors matter.
>>>> However, since I wanted to circumvent this, I chose for the option
>>>> by="margin" (adonis2(speciesdataset~A*B, by="margin")) but then only the
>>>> effect of the interaction is tested. On the "help page" of anova. cc

Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?

2018-10-16 Thread Jari Oksanen
vegan::adonis2 only handles marginal effects with by = “margin” (and hence only 
term A:B of A*B), but RVAideMemoire package has function adonis.II that also 
can do “type II” and “type III” tests (what ever these mean with adonis) which 
may be something you are looking for. I haven’t checked how these tests were 
implemented, but you may do that in your leisure if you think this is what you 
want to have.

Cheers, Jari Oksanen

> On 16 Oct 2018, at 14:53 pm, Ellen Pape  wrote:
> 
> Hi,
> 
> I know that A*B = A+B+A:B, but in this case, i.e. doing an adonis2 and
> specifying by="terms" will only do the test for the interaction, not the
> main effects. If one chooses by="terms", you will get a test for the main
> effects and the interaction, but than the order of factors matters.
> 
> Best regards
> Ellen
> 
> On Tue, 16 Oct 2018 at 12:23, Torsten Hauffe 
> wrote:
> 
>> "adonis2(speciesdataset~A*B, by="margin") but then only the effect of the
>> interaction is tested."
>> 
>> This is not entirely correct.
>> adonis2(speciesdataset~A:B, by="margin") would test the interaction
>> alone.  ~A*B unfolds to ~A+B+A:B
>> 
>> On Tue, 16 Oct 2018 at 11:51, Ellen Pape  wrote:
>> 
>>> Hi all,
>>> 
>>> I don't know whether this is the correct mailing group to address this
>>> question:
>>> 
>>> I would like to perform a 2-way permanova analysis in R (using adonis in
>>> vegan). By default you are performing sequential tests (by="terms"), so
>>> when you have 2 or more factors, the order of these factors matter.
>>> However, since I wanted to circumvent this, I chose for the option
>>> by="margin" (adonis2(speciesdataset~A*B, by="margin")) but then only the
>>> effect of the interaction is tested. On the "help page" of anova. cca it
>>> says: "if you select by="margin" -> the current function only evaluates
>>> marginal terms. It will, for instance, ignore main effects that are
>>> included in interaction terms."
>>> 
>>> 
>>> My question now is: can I somehow get the main effects tested anyhow, when
>>> the interaction term is not significant?
>>> 
>>> Thanks,
>>> Ellen
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> ___
>>> R-sig-ecology mailing list
>>> R-sig-ecology@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?

2018-10-16 Thread Jari Oksanen



On 16/10/18 11:23, Torsten Hauffe wrote:
>   "adonis2(speciesdataset~A*B, by="margin") but then only the effect of the
> interaction is tested."
> 
> This is not entirely correct.
> adonis2(speciesdataset~A:B, by="margin") would test the interaction alone.
> ~A*B unfolds to ~A+B+A:B

Well, it was correct: the only **marginal** effect in ~A+B+A:B is A:B (A 
and B are not marginal), and by = "margin" will only analyse marginal 
effects.

Cheers, Jari Oksanen
> 
> On Tue, 16 Oct 2018 at 11:51, Ellen Pape  wrote:
> 
>> Hi all,
>>
>> I don't know whether this is the correct mailing group to address this
>> question:
>>
>> I would like to perform a 2-way permanova analysis in R (using adonis in
>> vegan). By default you are performing sequential tests (by="terms"), so
>> when you have 2 or more factors, the order of these factors matter.
>> However, since I wanted to circumvent this, I chose for the option
>> by="margin" (adonis2(speciesdataset~A*B, by="margin")) but then only the
>> effect of the interaction is tested. On the "help page" of anova. cca it
>> says: "if you select by="margin" -> the current function only evaluates
>> marginal terms. It will, for instance, ignore main effects that are
>> included in interaction terms."
>>
>>
>> My question now is: can I somehow get the main effects tested anyhow, when
>> the interaction term is not significant?
>>
>> Thanks,
>> Ellen
>>
>>  [[alternative HTML version deleted]]
>>
>> ___
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] distance matrix with chao-jaccard method

2018-10-14 Thread Jari Oksanen
vegan::vegdist() function **has** Chao-Jaccard index (method = “chao”). In 
addition, vegan has function chaodist that is similar to designdist(), but uses 
the Chao terms (U, V) allowing you to define any Chao distance (see ?chaodist 
for examples).

Vegan has these choices, but I don’t endorse them: it’s all up to your 
responsibility.

cheers, Jari Oksanen 

> On 14 Oct 2018, at 19:07 pm, Irene Adamo  wrote:
> 
> Dear all,
> I would like to create a distance matrix based on the similarity
> Chao-Jaccard index based on raw abundances in R but so far I have not been
> able to find a package that does it. The Vegan package does not have this
> option and the dis.chao function of CommEcol creates a dissimilarity
> matrix. I hope that you can help find a solution.
> 
> thanks in advance!
> Irene Adamo
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] instability of NMDS

2017-11-22 Thread Jari Oksanen
Dear Chitra,

What do you mean with "extracting scores"? 

Do you have the result object? In that case, use function scores() to exact 
scores.

Do you mean that you cannot get the result object? If so, have you read the 
section on "Convergence Problems" in the function documentation (?metaMDS). 
Have you worked like suggested in that document? What failed when you followed 
the instructions?

Cheers, Jari Oksanen 

From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Chitra 
Baniya <cbban...@gmail.com>
Sent: 22 November 2017 05:52
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] instability of NMDS

*Hi dear R and vegan users,*
*I am a R version 3.4.1 user under the Platform: x86_64-pc-linux-gnu
(64-bit). My vegan package is 2.4-4.*
*Please suggest me best way to extract the sample and species score. Here I
got struct with instability of nmds.*
*Thank you in advance.*
*Chitra *

--






*Chitra Bahadur Baniya, M Sc, M Phil, PhDAssociate ProfessorCentral
Department of BotanyTribhuvan UniversityKirtipurKathmandu, Nepal*
orcid.org/-0002-8746-7601

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Factors in partial RDA part 2

2017-02-08 Thread Jari Oksanen
I said earlier that you can use either dummy variables or factors with little 
difference. However, there are some cases where the differences becomes 
important: that is the case when you start playing with individual dummy 
variables like they were real variables. Don't do that! Do not use forward.sel 
with dummy variables: it makes no sense. Do not look at the VIF values of 
single factor levels: that rarely makes sense, and they can never suggest 
removing single levels of factors. It is the whole factor, in our out.  Never 
partly in, partly out. If you go to play like that, you really should switch to 
standard R way of defining your factor as a factor. The highish VIF values for 
some levels are probably triggered by correlations with some continuous 
variables in your model. 

There should be a generalized VIF in vegan that gives one statistic for the 
whole factor. Contributions are welcome at http://github.com/vegandevs/vegan.

cheers, Jari Oksanen 

From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Andrew 
Halford <andrew.half...@gmail.com>
Sent: 08 February 2017 10:14
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Factors in partial RDA part 2

Hi Listers,

Further to my last post I am seeking more insights into how to interpret
effects of Factors in a RDA analysis.

I think of a factor as a single variable whose influence on observed fish
distribution patterns I would like to quantify, along with a bunch of other
numerical variables.

To do the analyses this Factor (called 'geom') is turned into a number of
dummy variables (seven actually).

The conceptual problem I am having is that when I do a call to vif.cca to
check on collinearity for example, the output suggests I should remove some
of the dummy variables making up the levels of my Factor. Doing this would
leave me with only 3 of the dummy variables out of the original 8 to put
into the RDA. I then don't see that I am actually testing the Factor 'geom'
anymore but rather just individual variables representing a couple of the
different levels of the original Factor. How do I proceed with this?

The same conundrum for me is seen when I run the forward.sel command to
look at the most efficient number of explanatory variables to have in the
final model. The process selects only some of the dummy variables to
include in the model. Again I struggle to see how I am testing or including
the full effects of the Factor 'geom' if only a few of the dummy variables
are actually included in the model.

# here is the model run with all the potential explanatory variables
('geom' is the FACTOR with 7 levels)

> fish.env <-
rda(fish.h~coral_cover+macroalgae+turf_algae_sqrt+ccc_4thrt+rubble_sqrt+reef_slope_sqrt+rugosity

+exposure+min_d_sqrt+d_range+chl_a_log+popn_density_4throot+fp+protection
+geom,data=env.factor)

# collinearity assessment - the results favour dropping 3 of the 'geo'
dummy variables leaving only 2 for the model.
> vif.cca(fish.env)

coral_cover   macroalgae  turf_algae_sqrt
ccc_4thrt  rubble_sqrt  reef_slope_sqrt
2.688656 3.099972 2.849219
1.771637 2.411291 2.418953
rugosity exposure   min_d_sqrt
d_rangechl_a_log popn_density_4throot
2.967752 3.433961 2.587696
2.643991 3.626107 4.571781
fp  protection   geomgeo_bl
geomgeo_cbrcgeomgeo_isefrgeomgeo_isprc
4.059624 3.195210 4.329852
12.657270 9.57001512.052385
geomgeo_lefr geomgeo_oefr
7.34781217.090731

# here I have kept all the 'geom' dummy variables to submit to forward
selection and it only selects 3 of them, hence it doesnt feel that I am
actually including a Factor 'geom' in the model but rather just a few
individual dummy variables?


> forward.sel(fish.h,env.dummy3,adjR2thresh=R2a.all_fish_env)

Testing variable 1
Testing variable 2
Testing variable 3
Testing variable 4
Testing variable 5
Testing variable 6
Testing variable 7
Testing variable 8
Procedure stopped (alpha criteria): pvalue for variable 8 is 0.092000
(superior to 0.05)
variables order R2 R2Cum   AdjR2CumF  pval
1exposure 8 0.07086240 0.0708624 0.04925455 3.279475 0.001
2  fp13 0.04756799 0.1184304 0.07645089 2.266248 0.002
3   geo_isefr16 0.04571706 0.1641474 0.10298750 2.242500 0.003
4   chl_a_log11 0.03812686 0.2022743 0.12250174 1.911778 0.009
5  geo_bl17 0.03423972 0.2365140 0.13863121 1.749016 0.008
6   geo_isprc21 0.03384005 0.2703541 0.15514682 1.762391 0.012
7 reef_slope_sqrt 6 0.03466752 0.3050216 0.17353919 1.845666 0.007


Any advice appreciated

cheers

Andy






--
Andrew Halford Ph.D
Research Scientist (Kimberley Marine Parks)
Dept. Pa

Re: [R-sig-eco] Factors in partialRDA

2017-02-07 Thread Jari Oksanen
Andy,

That’s correct: if you have coded your variable as a single factor, R (and 
hence vegan) will automatically create dummy variables. If you have originally 
coded your variable as a set of dummy variables, and you have done that 
correctly, you will get the same result. If you do  not drop one of the dummy 
variables, R will do it for you and tell that some of your original variables 
were aliased. This will not influence the results, though. 

In conclusion: everything goes, but it is best to use the R way and define your 
factor as a single factor.

Cheers, Jari Oksanen
> On 7 Feb 2017, at 10:48 am, Andrew Halford <andrew.half...@gmail.com> wrote:
> 
> Hi Listers,
> 
> I am running a partial RDA analysis and I have an environmental dataset
> which includes a non-ordered factor representing reef type.
> 
> I want to test for a significant linear trend in the environmental dataset
> as part of the process of testing for spatial autocorrelation.
> 
> e.g. lineartest <- anova(rda(env,latlong))
> 
> I have coded this Factor variable up as a group of dummy variables equal to
> the number of different reef types to run the test but I am unsure if I
> need to drop one of the dummy variables before running the linear test?
> 
> I will revert back to a single Factor variable when I run the RDA as told
> in the vegan documentation?
> 
> My reading tells me that the call to RDA in Vegan will automatically deal
> with creating dummy variables from a Factor variable presented in the model
> but this is not the case
> 
> cheers
> 
> Andy
> 
> 
> Andrew Halford Ph.D
> Research Scientist (Kimberley Marine Parks)
> Dept. Parks and Wildlife
> Western Australia
> 
> Ph: +61 8 9219 9795
> Mobile: +61 (0) 468 419 473
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Errors with Simprof for cluster significance

2016-10-07 Thread Jari Oksanen
Ansley, I think you may have an old version of clustsig: the current one should 
not give an error of missing undef.zero. Please check if you need to update.

cheers, j.o.
> On 7 Oct 2016, at 19:21 pm, Sarah Goslee  wrote:
> 
> I can't duplicate your error. Assuming the attached document describes
> your data, it works fine. I changed the code to use braycurtis instead
> of euclidean, as you wanted, and it does return a warning, just as the
> help file documents.
> 
> So you'll need to provide more information.
> 
> 
> data<- structure(list(necsur = c(1L, 4L, 0L, 8L, 0L, 1L), necame = c(4L,
> 5L, 9L, 9L, 4L, 7L), niccar = c(1L, 1L, 1L, 2L, 1L, 4L), nicorb = c(2L,
> 20L, 23L, 26L, 3L, 12L), nicpus = c(0L, 0L, 1L, 0L, 0L, 0L),
>nictor = c(0L, 2L, 1L, 3L, 2L, 1L), delgib = c(10L, 31L,
>47L, 48L, 15L, 55L), cancha = c(5L, 6L, 4L, 4L, 1L, 6L),
>melbis = c(3L, 0L, 1L, 3L, 0L, 1L), atelec = c(4L, 6L, 28L,
>22L, 8L, 52L), copmin = c(0L, 0L, 1L, 1L, 0L, 1L), ontcon = c(3L,
>3L, 11L, 7L, 1L, 2L), ontdep = c(2L, 0L, 0L, 0L, 0L, 0L),
>onthec = c(17L, 15L, 9L, 6L, 6L, 2L), ontstr = c(0L, 0L,
>0L, 1L, 1L, 0L), onttau = c(20L, 13L, 6L, 2L, 0L, 2L), ontpen = c(2L,
>3L, 5L, 3L, 2L, 4L), onttub = c(2L, 3L, 4L, 1L, 1L, 0L),
>phavin = c(1L, 0L, 0L, 0L, 0L, 1L), Phyili = c(0L, 0L, 0L,
>0L, 0L, 1L), canvir = c(0L, 1L, 0L, 0L, 0L, 0L), hybill = c(1L,
>0L, 0L, 0L, 0L, 0L), cyclev = c(0L, 0L, 0L, 0L, 1L, 1L),
>galjan = c(0L, 0L, 1L, 1L, 0L, 0L), cyclosig = c(0L, 0L,
>0L, 0L, 1L, 0L), omomon = c(1L, 2L, 4L, 10L, 1L, 6L), trofov = c(1L,
>2L, 3L, 1L, 0L, 1L), trouni = c(1L, 0L, 0L, 1L, 0L, 0L),
>troter = c(0L, 1L, 1L, 0L, 0L, 0L), eusass = c(9L, 8L, 23L,
>14L, 11L, 28L), hiscoe = c(2L, 1L, 10L, 4L, 2L, 4L), hisabb = c(0L,
>0L, 0L, 0L, 2L, 0L), cremax = c(4L, 7L, 1L, 2L, 2L, 5L),
>plamac = c(1L, 0L, 3L, 2L, 2L, 2L), plafem = c(0L, 0L, 0L,
>0L, 0L, 0L), plafos = c(1L, 1L, 3L, 2L, 1L, 2L), placom = c(6L,
>3L, 3L, 10L, 13L, 7L), tacfim = c(0L, 0L, 1L, 0L, 0L, 0L)), .Names
> = c("necsur",
> "necame", "niccar", "nicorb", "nicpus", "nictor", "delgib", "cancha",
> "melbis", "atelec", "copmin", "ontcon", "ontdep", "onthec", "ontstr",
> "onttau", "ontpen", "onttub", "phavin", "Phyili", "canvir", "hybill",
> "cyclev", "galjan", "cyclosig", "omomon", "trofov", "trouni",
> "troter", "eusass", "hiscoe", "hisabb", "cremax", "plamac", "plafem",
> "plafos", "placom", "tacfim"), row.names = c("AP-0", "AP-100",
> "AP-200", "AP-300", "ST-0", "ST-100"), class = "data.frame")
> 
> 
>> simprof(data, num.expected=1000, num.simulated=999,
> +  method.cluster="average", method.distance="braycurtis",
> +  method.transform="identity", alpha=0.05,
> +  sample.orientation="row", const=0,
> +  silent=TRUE, increment=100,
> +  undef.zero=TRUE, warn.braycurtis=TRUE)
> $numgroups
> [1] 2
> 
> $pval
> [,1] [,2][,3]
> [1,]   -3   -4  NA
> [2,]   -61 0.461461461
> [3,]   -1   -2  NA
> [4,]   -53 0.905905906
> [5,]24 0.003003003
> 
> $hclust
> 
> Call:
> hclust(d = rawdata.dist, method = method.cluster)
> 
> Cluster method   : average
> Number of objects: 6
> 
> 
> $significantclusters
> $significantclusters[[1]]
> [1] "ST-100" "AP-200" "AP-300"
> 
> $significantclusters[[2]]
> [1] "ST-0"   "AP-0"   "AP-100"
> 
> 
> Warning messages:
> 1: This version of the Bray-Curtis index does not use standardization.
> 2: To use the standardized version, use "actual-braycurtis".
> 3: See the help documentation for more information.
> 
> 
> On Thu, Oct 6, 2016 at 8:43 PM, Ansley Silva  wrote:
>> Hello all,
>> 
>> I'm using R version 3.3.1, clustsig package.
>> 
>> I have a community dataset and I originally used hclust in the vegan
>> package to get dendrograms, however I need significance on the groups that
>> were formed.  I'd really like to use SIMPROF to look for significance among
>> the groups, but I am running into errors.
>> 
>> 
>> These are the errors I get:
>> 
>>> simprof(apst, num.expected=1000, num.simulated=999,
>> method.cluster="average", method.distance="braycurtis", alpha=0.05,
>> sample.orientation = "column", const = 0, silent = TRUE,increment=100)
>> 
>> Error: argument "undef.zero" is missing, with no default
>> In addition: Warning messages:
>> 1: This version of the Bray-Curtis index does not use standardization.
>> 2: To use the standardized version, use "actual-braycurtis".
>> 3: See the help documentation for more information.
>> 
>>> simprof(apst, num.expected=1000, num.simulated=999,
>> method.cluster="average", method.distance="actual-braycurtis", alpha=0.05,
>> sample.orientation = "column", const = 0, silent = TRUE,increment=100)
>> 
>> Error in if (denom != 0) { : missing value where TRUE/FALSE needed
>> 
>>> simprof(apst, num.expected=1000, num.simulated=999,
>> method.cluster="average", method.distance="braycurtis", alpha=0.05,
>> sample.orientation = 

Re: [R-sig-eco] raremax value >2, cannot run rarefy

2016-08-05 Thread Jari Oksanen
I cannot reproduce anything of this. I cannot see your csv files, and cannot 
reproduce the analysis. I can see one attached file (raremax.txt) that has a 
structure that you call "data". When I use that in place of patp (that I don't 
have), everything runs smoothly. I have no idea what were the warnings you got 
-- did you use warnings() to see them like the output suggested? Were those 
warnings? Did they give any useful hints?

A few words about the extremes you drove rarefy: if you rarefy your community 
to one individual, you will have one species (and  no error), and if you rarefy 
to zero individuals have zero species. It never occurred to me to test the 
function with these absurd values. It is nice to see that zero species with 
zero error was correctly returned. However, it seems that with rarefaction to 
one individual the number of species (1) was correctly reported, but the 
function tried to estimate the standard error in vain with some numerical 
problems: the standard error should be zero exactly, but the function found 
zero numerically (about plus/minus 0.0003), and when this numerical zero 
was negative, you got a warning of taking square root of negative values of 
magnitude sqrt(-1e-16).

You cannot meaningfully rarefy to one individual. The lowest you can do 
meaningfully is to rarefy to two individuals. You cannot meaningfully ask how 
many species you have if you have nothing (zero), or if you only have one 
individual (one). Think about it.

Cheers, Jari Oksanen

From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Ansley 
Silva <daily.p...@gmail.com>
Sent: 04 August 2016 23:30
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] raremax value >2, cannot run rarefy

Hello,

R version 3.2.2, using the vegan package.

I am using the rarefy function to get rarefied values for my rarefaction
curves.
​Please see the attached data and code.
Instead of getting an output from rarefy function of values for S and SE, I
am getting the following message:
"There were 20 warnings (use warnings() to see them)"
It seems that the issue might be that my raremax value for this dataset is
equal to 1.  When I ran the same code on a dataset with a raremax of 5,
everything is okay.  When I ran the same code on a dataset with a raremax
of 0, my output of S and SE were all 0.
I am looking for an explanation for this.  Is there a way to get better
values for the output?  Would PC-ORD do the same thing?

Thanks much,
​

--
Ansley Silva


*"The clearest way into the Universe is through a forest wilderness." John
Muir*


*Graduate Research Assistant*

*University of Georgia*

*D.B. Warnell School of Forestry and Natural Resources*

*180 East Green Street*

*Athens, GA 30602*
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Follow-up to Vegan metaMDS: unusual first run stress values with large data set

2016-05-03 Thread Jari Oksanen
Ewan,

You already got some good hints from Peter Michin (outside this list), but here 
some comments.

> On 2 May 2016, at 20:22 pm, Ewan Isherwood <ewan.isherw...@gmail.com> wrote:
> 
> Hi r-sig-ecology!
> 
> This is mostly a message for Jari Oksanen or another Vegan developer that
> may be working specifically with metaMDS, but I'm opening up to anyone that
> has any interest in this. First of all, you can see my original post here:
> http://r-sig-ecology.471788.n2.nabble.com/Vegan-metaMDS-unusual-first-run-stress-values-with-large-data-set-td7577720.html
> 
> Basically, I'm having the same issues with the metaMDS engine as above (R
> 3.2.5, Vegan 2.3-5). This time my dataset is larger at 9239 sites x 85
> species.
> 
> I've tried adjusting the sfgrmin value up to an absurd 1e-10,000,000
> (decreasing this value to -7 resolved the issue last time)
> 
> I've tried upping the sratmax to about 0.99 recurring with 77 9's (I don't
> think this should have an effect since it's concerned with the iterations
> stopping when the stress ratio between two iterations goes above the
> inputted value)
> 

Data sets with this size can really be difficult for NMDS because location of 
any point has tinies effect on the stress. In this kind of situations the first 
thing is to find why the analysis stops. There are three stopping criteria and 
you have changed two (steepness of gradient sfgrmin and the change in stress 
sratmax). Your changes are really absurd since they exceed the numerical 
accurracy of digital computers. If you launch metaMDS() with argument trace=2 
you will get more verbose output that reports the stopping criterion used. If 
it is always the same criterion, you could make that one stricter (but stay 
within the numbers digital computers can handle). My first guess is that with 
data of this dimensions, you very easily stop because you exceed the maximum 
number of iterations. If that happens, you actually stop before reaching a 
solution, and you should increase maxit. Only after checking these things you 
should proceed with looking at the peculiarities of your data !
 (disjunct or nearly disjunct subsets etc.)

cheers, Jari Oksanen

> I've tried using the Jaccard and Bray methods (I don't think this should
> have an effect)
> 
> I've trialled 3-6 dimensions randomly (this has in the past affected the
> result, but that might be because of other factors)
> 
> I have always used the noshare = TRUE option otherwise it ejects some of
> the sampling points with rare species to astronomical values on one or more
> axes
> 
> I've tried iterations of this about 20-30 times but it still won't ever
> give me a best solution that isn't the first run. Here is the basic code:
> 
> metaMDS(PSU.sp, k= x, distance = "jaccard", sfgrmin = x, sratmax = x,
> noshare = TRUE)
> 
> I'm happy to privately share the raw dataset with Jari Oksanen if he's
> interested in this phenomenon, but I would have to seek permission for
> anyone else since I do not own it. In the meantime I will investigate other
> methods to analyse this data, which shouldn't be an issue. Since my dataset
> is unusually large for this method, this is probably more for curiosity's
> sake for the Vegan developers.
> 
> Thanks for your help,
> 
> Ewan Isherwood
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] specaccum

2016-03-19 Thread Jari Oksanen

> On 17 Mar 2016, at 20:21 pm, farga -not old one- <id.fa...@gmail.com> wrote:
> 
> Hello I'm working with a species discovery curve and want to use the Clench
> ecuation in said curve, I'm using specaccum {vegan} however I'm not sure if
> speccacum use said ecuation, I also require a summary from said ecuation
> and both curves, the asymptote curve and the curve from my data, in a plot.
> Any ideas or is there any package i should use instead vegan?
> 

I don’t know what is the “Clench equation”, but Google claims that "Clench 
(1979) proposed the use of the Michaelis-Menten equation”. Michaelis-Menten is 
available in fitspecaccum() in vegan.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] NMDS axes scores

2016-01-11 Thread Jari Oksanen
Contrary to common misbelief, NMDS ordination space is **metric**. In vegan, 
the ordination space (= the ordination result) is even guaranteed to be 
Euclidean (in isoMDS it can be Minkowski, but this is not allowed with vegan). 
What is non-metric is the regression from observed dissimilarities to the 
Euclidean distances in ordination space. The reason why we do not recommend 
using NMDS axes as independent beasts is that NMDS tries to preserve the 
*distances* among points. Any orthogonal rotation (= turning of ordination 
space) will change scores along rotated axes, but retain the distances among 
points. The vegan NMDS result is rotated to principal components, but still you 
should avoid thinking that this makes dimensions independent from each other, 
although the first maximizes the dispersion of points and axes are orthogonal 
(non-correlated).

PCA ordination is Euclidean in the same way as NMDS. The difference to NMDS are 
that (1) only Euclidean distances among sampling units can be used in PCA (in 
NMDS you can use any adequate dissimilarity), and (2) the mapping is linear 
(instead of non-metric) from observed dissimilarities to Euclidean 
dissimilarities. Try function stressplot() in vegan to see what this means — it 
is available both for NMDS and rda (PCA) results.  CA is similar to PCA except 
that it is based on weighted Euclidean distances. I won’t go into mathematical 
details, but you can see ?wcmdscale in vegan to see how to get CA as a weighted 
Euclidean ordination of Chi-square transformed data. 

PCA and CA have some ordering criteria for their axis and therefore some people 
have used axes from those as independent beasts. I think this is dubious, too, 
but people do it all the time. The PCA/CA also define a multivariate space, and 
taking only one axis as an independent object sounds strange, in particular if 
you take something else than the first axes. 

So what to do with NMDS axes? If you take all NMDS axes and their interactions 
in a regression of type ~ axis1 + axis2 + axis1:axis2 then this is equal to 
fitting a linear trend surface, and the interaction term axis1:axis2 takes care 
that the result is invariant under rotation of NMDS space. Function ordisurf() 
in vegan gives further ideas how to fit surfaces to NMDS *space* (instead of 
simple axis). Also, if you think that some direction in NMDS (not necessarily 
parallel to the axes) is good and you have an indicator variable for that, you 
can use MDSrotate() function in vegan to rotate your solution to that direction 
and then take that rotated axis as your explanatory variable. 

HTH, Jari Oksanen

> On 11 Jan 2016, at 10:38 am, Martin Weiser <weis...@natur.cuni.cz> wrote:
> 
> Hi Conny,
> 
> AFAIK NMDS is *non-metric* and represents distances among objects, not
> gradients along axes (known or unknown): distances along axes are
> stretched as needed locally (NMDS works with rank order), even order of
> the elements along axes does not tell anything. NMDS is great if you
> want to say: Object A resembles object C more than it resembles object
> B, even though C and B are quite similar.
> Try this: run NMDS several times, aim for different number of axes (e.g.
> 1,2,3,5,10) and note the scores of the objects along the first one.  You
> *may* get the same thing.
> 
> If you need scores of the objects in the ordination, use something with
> well defined metrics and axes, e.g. PCA, CA.
> 
> HTH,
> Martin
> 
> On 9.1.2016 05:41, Conny wrote:
>> Hi all,
>> 
>> 
>> 
>> it has been frequently pointed out in this group, that NMDS axes scores
>> shouldn't be used individually for further analysis.  
>> 
>> I therefore would like to include both of my NMDS site scores as a response
>> into a GLM model simultaneously.  Unfortunately, I couldn't find any advice
>> on how to actually do this. I found a  couple of papers using NMDS scores in
>> GLMs, but they all seem to use them individually, fitting separate models to
>> each of the ordination axes.
>> 
>> 
>> 
>> I'm a bit at a loss here and any advice is very much appreciated,
>> 
>> Conny
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 
> -- 
> 
> --
> Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta 
> Univerzity Karlovy v Praze:
> a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu,
> b) stanovuje, že smlouva musí mít písemnou formu,
> c) vylučuje přijetí nabídky s dodatkem či odchylkou,
> d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na 
&

Re: [R-sig-eco] NMDS axes scores

2016-01-11 Thread Jari Oksanen

> On 11 Jan 2016, at 14:13 pm, Bob O'Hara  wrote:
> 
> 
> Whilst I'm filling bandwidth, I'm not sure Jari's suggestion that you need 
> the interaction term is correct. If a model is linear in axis1 and axis2, 
> then any rotation is also linear, i.e. the transformation is c_1 axis1 + c_2 
> axis2. So you only need Y ~ axis1 + axis2. Basically, it's still plane. The 
> interaction adds a curve: if you want that you also need to include quadratic 
> terms, i.e. Y ~ axis1 + axis2 + axis1^2 + axis2^2 + axis1:axis2.
> 
Me neither: I think my suggestion to include interaction term was wrong. This 
belief seems to be shared by vegan authors who define linear trend surface 
without interaction in ordisurf().

Cheers, Jari O.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NMDS axes scores

2016-01-11 Thread Jari Oksanen
Zoltan,

You’d better ask Bob…

If  you really want to get a synthetic (latent) variable with reduced noise, I 
think you really should be doing Factor Analysis. In particular, you should  
have confirmatory factor analysis, a.k.a. measurement model in latent linear 
structural models. Often taking first axes of PCA can do about the same, except 
that you rarely have sound model construction in PCA. I’m getting more liberal 
minded with age, and I can accept many kind of analyses, though.

Cheers, Jari O.
> On 11 Jan 2016, at 17:29 pm, Zoltan Botta-Dukat 
> <botta-dukat.zol...@okologia.mta.hu> wrote:
> 
> Dear Jari,
> 
> What is your opinion about using first few axes of a metric ordination? I'm 
> aware that it is meaningless using first two axes of NMDS ordination that 
> calculated for three  dimensions. But in my experience, it is often useful to 
> use only first few axes of metric ordination instead of raw data: no 
> ecological relevant information is lost, only the 'noise' is reduced.
> 
> Best wishes
> 
> Zoltan
> 
> 2016.01.11. 11:11 keltezéssel, Jari Oksanen írta:
>> Contrary to common misbelief, NMDS ordination space is **metric**. In vegan, 
>> the ordination space (= the ordination result) is even guaranteed to be 
>> Euclidean (in isoMDS it can be Minkowski, but this is not allowed with 
>> vegan). What is non-metric is the regression from observed dissimilarities 
>> to the Euclidean distances in ordination space. The reason why we do not 
>> recommend using NMDS axes as independent beasts is that NMDS tries to 
>> preserve the *distances* among points. Any orthogonal rotation (= turning of 
>> ordination space) will change scores along rotated axes, but retain the 
>> distances among points. The vegan NMDS result is rotated to principal 
>> components, but still you should avoid thinking that this makes dimensions 
>> independent from each other, although the first maximizes the dispersion of 
>> points and axes are orthogonal (non-correlated).
>> 
>> PCA ordination is Euclidean in the same way as NMDS. The difference to NMDS 
>> are that (1) only Euclidean distances among sampling units can be used in 
>> PCA (in NMDS you can use any adequate dissimilarity), and (2) the mapping is 
>> linear (instead of non-metric) from observed dissimilarities to Euclidean 
>> dissimilarities. Try function stressplot() in vegan to see what this means — 
>> it is available both for NMDS and rda (PCA) results.  CA is similar to PCA 
>> except that it is based on weighted Euclidean distances. I won’t go into 
>> mathematical details, but you can see ?wcmdscale in vegan to see how to get 
>> CA as a weighted Euclidean ordination of Chi-square transformed data.
>> 
>> PCA and CA have some ordering criteria for their axis and therefore some 
>> people have used axes from those as independent beasts. I think this is 
>> dubious, too, but people do it all the time. The PCA/CA also define a 
>> multivariate space, and taking only one axis as an independent object sounds 
>> strange, in particular if you take something else than the first axes.
>> 
>> So what to do with NMDS axes? If you take all NMDS axes and their 
>> interactions in a regression of type ~ axis1 + axis2 + axis1:axis2 then this 
>> is equal to fitting a linear trend surface, and the interaction term 
>> axis1:axis2 takes care that the result is invariant under rotation of NMDS 
>> space. Function ordisurf() in vegan gives further ideas how to fit surfaces 
>> to NMDS *space* (instead of simple axis). Also, if you think that some 
>> direction in NMDS (not necessarily parallel to the axes) is good and you 
>> have an indicator variable for that, you can use MDSrotate() function in 
>> vegan to rotate your solution to that direction and then take that rotated 
>> axis as your explanatory variable.
>> 
>> HTH, Jari Oksanen
>> 
>>> On 11 Jan 2016, at 10:38 am, Martin Weiser <weis...@natur.cuni.cz> wrote:
>>> 
>>> Hi Conny,
>>> 
>>> AFAIK NMDS is *non-metric* and represents distances among objects, not
>>> gradients along axes (known or unknown): distances along axes are
>>> stretched as needed locally (NMDS works with rank order), even order of
>>> the elements along axes does not tell anything. NMDS is great if you
>>> want to say: Object A resembles object C more than it resembles object
>>> B, even though C and B are quite similar.
>>> Try this: run NMDS several times, aim for different number of axes (e.g.
>>> 1,2,3,5,10) and note the scores of the objects along the first one.  You
>>> *may* get the same thing.
>&

Re: [R-sig-eco] Morisita horn similarity index

2015-11-28 Thread Jari Oksanen
Vegan::vegdist has actually two related indices: Morisita index (“morisita”) 
that assumes that input data are integers (individuals, not percentages), and 
its Horn-Morisita (“horn”) variant that does not have this restriction. 
However, you should always check the formulae of indices, because the names may 
be used loosely. There is also Horn index that is not implemented in 
vegan::vegdist. For these indices, vegan follows Krebs’s “Ecological 
Methodology”.

cheers, Jari Oksanen
> On 27 Nov 2015, at 15:41 pm, Zoltan Botta-Dukat 
> <botta-dukat.zol...@okologia.mta.hu> wrote:
> 
> Dear Moses,
> 
> Vegdist supposes that input data are numbers of individuals not percentages. 
> So, I'm afraid it cannot help to Moses.
> 
> 
> 
> 
> 2015.11.27. 13:17 keltezéssel, Roman Luštrik írta:
>> Hi,
>> 
>> `vegdist` function of vegan package
>> <http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/vegdist.html> implements
>> the morisita index. Is this what you're looking for?
>> 
>> Cheers,
>> Roman
>> 
>> 
>> On Fri, Nov 27, 2015 at 4:55 AM, moses selebatso <selebat...@yahoo.co.uk>
>> wrote:
>> 
>>>  Hello
>>> I am trying to analyse diet overlap (level of similarity) between two
>>> species. I have diet composition in %. I have tried to find the best tool,
>>> and thought Morisita horn will do, but I cant find the right package for.
>>> Is this the best tool?
>>> Thank you,
>>> Moses
>>> [[alternative HTML version deleted]]
>>> 
>>> ___
>>> R-sig-ecology mailing list
>>> R-sig-ecology@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>> 
>> 
>> 
> 
> 
> -- 
> Botta-Dukát Zoltán
> 
> Ökológiai és Botanikai Intézet
> Magyar Tudományos Akadémia
> Ökológiai Kutatóközpont
> 
> 2163. Vácrátót, Alkotmány u. 2-4.
> tel: +36 28 360122/157
> fax: +36 28 360110
> botta-dukat.zol...@okologia.mta.hu
> www.okologia.mta.hu
> 
> 
> Zoltán BOTTA-Dukát
> 
> Institute of Ecology and Botany
> Hungarian Academy of Sciences
> Centre for Ecological Research
> 
> H-2163 Vácrátót, Alkomány u. 2-4.
> HUNGARY
> Phone: +36 28 360122/157
> Fax..: +36 28 360110
> botta-dukat.zol...@okologia.mta.hu
> www.okologia.mta.hu
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] RDA

2015-11-26 Thread Jari Oksanen

> On 26 Nov 2015, at 20:05 pm, Marcelino de la Cruz <marcelino.delac...@upm.es> 
> wrote:
> 
> El 26/11/2015 a las 16:27, Richards, Christina escribió:
>> Hello!
>> 
>> That is very helpful and seems to work! Thank you!!
>> 
>> I did not realize we could use raw data in capscale,
> We could and we should! Currently, it is the only way to achieve it.
> 
> is this true only because it is conditioning variables?
> It seems that the current implementation of capscale only accepts a single 
> *data.frame* for both explanatory and conditioning variables
> 
Yes, this is true: current and *future* implementations of capscale/rda/cca 
(and future dbrda) will only have one data= argument. However, in addition to 
variables in the data frame given in data=, you can mix variables in the work 
environment in your formula. If you think you need to have several data frames 
for the data= argument, please consider cbind(). 

cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] RDA

2015-11-26 Thread Jari Oksanen

> On 26 Nov 2015, at 17:27 pm, Richards, Christina <c...@usf.edu> wrote:
> 
> Hello!
> 
> That is very helpful and seems to work! Thank you!!
> 
> I did not realize we could use raw data in capscale, is this true only 
> because it is conditioning variables? At any rate it recapitulates our 
> analysis using partial mantel in 2 species with 2 different results (one 
> significant, the other not), so I'm inclined to believe its doing something 
> similar.
> 
If you supply raw community matrix (observations times species), capscale() 
will internally turn it into dissimilarities using specified distance= method 
(defaults “euclidean”). You should verify that the selected distance= method is 
the one you need. The variables on the right-hand-side of the formula must be 
in “raw” (observations times variables) format. They cannot be distances. 

Cheers, Jari Oksanen

> ___
> From: Marcelino de la Cruz <marcelino.delac...@upm.es>
> Sent: Wednesday, November 25, 2015 3:28 AM
> To: Richards, Christina; r-sig-ecology@r-project.org
> Subject: Re: [R-sig-eco] RDA
> 
> Hi Christina,
> 
> I think this could work:
> 
> You should combine the *raw* y and z (i.e.not the Euclidean matrices) in
> the same data.frame (e.g. "yz"), and call capscale like this:
> 
> capscale (x ~y1+y2+...+yn + Condition(z1+z2+...+z80), data=xy)
> 
> 
> Cheers!
> 
> Marcelino
> 
> 
> 
> --
> Marcelino de la Cruz Rot
> Depto. de Biología Y Geología
> Universidad Rey Juan Carlos
> Móstoles España
> 
> 
> El 24/11/2015 a las 22:11, Richards, Christina escribió:
>> 
>> 
>> Hello!
>> 
>> 
>> We are trying to do a partial RDA with 3 matrices but our x matrix has a lot 
>> of missing data. We could use instead distance matrices which imputed 
>> missing data, but when we try to use capscale and the euclidean matrices, it 
>> seems we have to use the formula x~y + condition (z) and we cannot use a 
>> matrix of values for z.  We would like to identify the the effect of y on x 
>> with z partitioned out where:
>> 
>> 
>> x= dataframe of dna methylation with individuals listed in column 1 and 0/1 
>> data across ~80 columns
>> 
>> y= habitat for each individual
>> 
>> z= dataframe of genetic loci with individuals listed in column 1 and 0/1 
>> data across ~80 columns
>> 
>> 
>> Christina Richards, Ph.D.
>> University of South Florida
>> Department of Integrative Biology
>> 4202 East Fowler Avenue SCA 127
>> NES 107 (shipping)
>> Tampa, FL 33620
>> (813)974-5090
>> (813)974-3263 FAX
>> http://www.ecologicalepigenetics.com
>> Twitter: @EcolEpig
>> Facebook: Ecological Epigenetics
>> 
>>  [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-ecology mailing list
>> R-sig-ecology@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>> 
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Differences between total constrained inertia (rda) and variance explained (varpart()) in vegan

2015-11-02 Thread Jari Oksanen
Probaby the difference is the adjustment: varpart() uses adjusted R-squared, 
but rda() output reports unadjusted proportions. RsqureAdj() function gives 
both.

Cheers, jari Oksanen

From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Tim 
Richter-Heitmann <trich...@uni-bremen.de>
Sent: 02 November 2015 15:58
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Differences between total constrained inertia (rda) and 
variance explained (varpart()) in vegan

Dear list,

when i perform RDA on a species set constrained with a set of
predictors, which i have tested for their relevance by forward
selection, i get values for inertia:

Inertia Proportion Rank
Total 0.126261.0
Constrained   0.054930.43507   11
Unconstrained 0.071330.56493   48

So, the amount of explained inertia is
0.05493/0.1262

or roughly 43%.

I am now grouping the same predictors into three groups (soil,
vegetation, space) and perform varpart() on them, i
get a total variance (sum of all explained variance) explained of only
30%. This has been consistent for me in a variety of datasets.
Can somebody explain to me the difference?

Thank you!


--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Quantifying widths of polygons

2015-10-01 Thread Jari Oksanen
The standard R tool is ellipsoidhull in the cluster package (which is a 
recommended R package and should be installed with your R). That will draw a 
minimum volume ellipsoid enclosing your points. See its documentation for 
further hints (like MASS::cov.vme). The cluster package also provides 
volume.ellipsoid function to find the volume. Obviously, these work in 2D.

Cheers, Jari Oksanen

> On 30 Sep 2015, at 17:38 pm, Baldwin, Jim -FS <jbald...@fs.fed.us> wrote:
> 
> One metric for an "average width" that would be quick to calculate might be 
> the diameter of a circle that has the same area as the polygon.  (Of course, 
> if the tree crowns are nowhere near circular, this won't likely be a useful 
> metric.)  Maybe there might be a similar approach for finding an ellipsoid 
> with the same area to deal with your desire for an "eccentricity" metric.  (A 
> fanciful approach might be to perform a principal components analysis on a 
> grid (or dense random selection) of points in the polygon and use the 
> "variance explained" for the two principal components to create the 
> semi-major and semi-minor axes of an ellipse.  In any event, the usefulness 
> of any metric will be based on how well it predicts or is associated with 
> some other variable or variables of interest.)
> 
> Jim
> 
> 
> -Original Message-
> From: R-sig-ecology [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of 
> Alexander Shenkin
> Sent: Wednesday, September 30, 2015 3:10 AM
> To: r-sig-ecology@r-project.org
> Subject: [R-sig-eco] Quantifying widths of polygons
> 
> Hello all,
> 
> I am working with data on tree crowns, and this data describes points
> (verticies) around the polyhedron of the crown volume (think of the crown as 
> a single volume with vertices and faces).  I can calculate maximum z distance 
> between any 2 points (maximum depth) and maximum x/y distance (maximum 
> width).  These are useful metrics.  I would also like to quantify an 
> "average" width of the polygon in 2D space (x/y only), as well as a metric 
> that would describe the "eccentricity" of the polygon. 
>  But, I'm not sure how to go about doing that.
> 
> In general, I've made the polyhedrons and polygons into convex shapes.
> 
> I have considered getting a centroid, intersecting lines every 10 degrees 
> (for example) going through the centroid with the x/y polygon owin in 
> spatstat, and then analyzing those line lengths.  But, I'm not sure that's 
> the right way to go, and maybe there are already tools out there to do this. 
> Any thoughts anyone might have would be very welcome!
> 
> Thanks,
> Allie
> 
> (btw, I posted this on R-help (and on R-sig-ecology with no response), and it 
> was suggested that a list such as this would be more appropriate... apologies 
> for the cross-post)
> 
> 
> library(rgl)
> library(spatstat)
> library(geometry)
>  x =
> c(1.9,-1.4,1.5,1.8,2.2,0.2,0.6,-0.9,-3.7,1.3,-1.9,-3.4,3.7,2.1,-2.0,-1.9)
> y =
> c(-3.1,3.0,1.1,-1.3,1.0,0.0,1.4,1.6,2.3,-3.6,-1.5,-1.3,0.3,-2.1,0.2,-0.3)
> z = c(5.5,4.5,4.3,4.8,6.7,5.8,7.4,6.2,3.5,2.9,4.0,3.7,3.2,3.0,3.1,8.4)
> depth = max(z) - min(z)
> width_max = max(dist(matrix(c(x,y),ncol=2)))
> 
> xy_win = owin(poly=list(x=x,y=y))
> conv_win = convexhull(xy_win)
> # from here, maybe draw lines every 10 degrees through a centroid?
> # avg_width = ??
> # eccentricity = ??
> 
> # 3D plot of crown polyhedron (convex)
> ps = data.frame(x=x,y=y,z=z)
> crown.surf = t(convhulln(matrix(c(x,y,z),ncol=3)))
> open3d()
> rgl.triangles(ps[crown.surf,1],ps[crown.surf,2],ps[crown.surf,3],col=heat.colors(nrow(ps)),alpha=.2)
> axes3d()
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] calculate dispersion index (betadisper, vegan)

2015-07-30 Thread Jari Oksanen
Natalie  Zoltan,

I'd be more worried about the effect of outliers on the area of convex hulls. 
Here an _ad hoc_ definition of outlier: outlier a single point that has a huge 
effect on the area of an enclosing convex hull.

That said, two points are certainly on a line, and three points can be (almost) 
on a line. Four or more points can be on a line, but rarely they are.

In vegan, we have function betadisper() that is suggested to be used to measure 
the dispersion. It may be interesting to compare this to convex hulls. In 
vegan, summary() of ordihull() result gives the areas of convex hulls (see 
?ordihull), and function ordiaretest() runs a permutation test on the areas of 
convex hulls.

Cheers, Jari Okanen 

On 30/07/2015, at 08:52 AM, Zoltan Botta-Dukat wrote:

 Hi Natalie,
 
 Be careful with convex hull volume, because it is zero if points lie on a 
 line (an near to zero, if the configuration is close to this), independently 
 from the amount of distances.
 
 Zoltan
 
 2015.07.29. 19:26 keltezéssel, Natalie írta:
 hi all
 
 I would like to  calculate a dispersion index or similar from the results of 
 the vegan function betadisper.
 i would like to know if there is a possibility to e.g. calculate the volume 
 of the community dispersion(area of the hull in the PCoA) from the 
 betadisper results.
 common is to show the distance to the centroid.but I have the impression the 
 dispersion within communities are not that clear,especially if you have 
 similar variance.
 The program primer v5 calculates a disperion index based on  the similarity 
 percentage.
 similarity percentage is not an option since it is mainly based on most 
 counted species.
 
 ideas and help are very much appreciated
 cheers natalie
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 -- 
 Botta-Dukát Zoltán
 
 Ökológiai és Botanikai Intézet
 Magyar Tudományos Akadémia
 Ökológiai Kutatóközpont
 
 2163. Vácrátót, Alkotmány u. 2-4.
 tel: +36 28 360122/157
 fax: +36 28 360110
 botta-dukat.zol...@okologia.mta.hu
 www.okologia.mta.hu
 
 
 Zoltán BOTTA-Dukát
 
 Institute of Ecology and Botany
 Hungarian Academy of Sciences
 Centre for Ecological Research
 
 H-2163 Vácrátót, Alkomány u. 2-4.
 HUNGARY
 Phone: +36 28 360122/157
 Fax..: +36 28 360110
 botta-dukat.zol...@okologia.mta.hu
 www.okologia.mta.hu
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] distances in NMDS ordination space

2015-07-16 Thread Jari Oksanen
Hi Kate,

I think we should use the name Euclidean NMDS for the kind of NMDS we have in 
vegan, because its ordination space is strictly Euclidean. There is nothing 
non-metric in the ordination space. What is non-metric is the transformation of 
observed dissimilarities to optimize the fit to the Euclidean ordination space. 
The central term in the squared stress function is

SUM (theta(d) - delta)^2

where delta are the Euclidean distances among points in the ordination, d are 
the observed community dissimilarities among sampling units, and theta() is a 
non-metric monotone function to transform d. Sum of squared differences is -- 
by definition -- squared Euclidean distance, and hence our kind of NMDS is 
Euclidean. 

The isolated axes are not meaningful *because* the space is Euclidean. 
Euclidean space is invariant under rotation: you can rotate axes and distances 
among points does not change, and you can rotate the axes and the configuration 
of points does not change. Any direction of axes is just as good.

In vegan, the default is to rotate axes to principal components so that first 
dimension is longest. However, you can also rotate a dimension parallel to an 
environmental variable using function MDSrotate. These rotations do not change 
the stress, configuration or distances among points. Such a rotated dimension 
can be more meaningful and there is some justification in using that as a 
variable in some other analysis.

To repeat: vegan NMDS is Euclidean NMDS and the NMDS ordination space is 
Euclidean. Because it is Euclidean, it is rotation-invariant and any rotation 
is equally good. Therefore axes do not have a natural orientation in Euclidean 
space. The only thing that is non-metric is the transformation of community 
dissimilarities. That non-metric transformation is made to optimize the 
goodness of fit to Euclidean ordination space.

Cheers, Jari Oksanen

On 16/07/2015, at 22:19 PM, Kate Boersma wrote:

 Hi all.
 
 I have a methodological question regarding non-metric multidimensional 
 scaling. This is not specific to R. Feel free to refer me to another 
 venue/resource if there is one more appropriate to my question.
 
 Correct me if I'm wrong: NMDS axes are non-metric, which is why NMDS 
 frequently makes sense for community data, but it also means that distances 
 in NMDS ordination space cannot be interpreted simplistically as they can in 
 eigenvalue-based methods like PCA. This is why it is inadvisable 
 (meaningless) to use NMDS axes as response variables in a linear modeling 
 framework (e.g., with environmental variables as predictors).
 
 My question is this: Does that mean that it is also inadvisable to use 
 distances among points in ordination space as response variables?
 
 My (potentially flawed) understanding: While the coordinates may not make 
 sense in isolation, they should be meaningful relative to each other. In a 2D 
 ordination, if communities A  B are closer together in ordination space than 
 communities C  D, that means they have more similar species compositions. 
 Therefore, I should be able to predict the distance between points in a 
 linear modeling framework.
 
 Alternately, I could use the actual distances among communities from my 
 dissimilarity matrix with a method like db-RDA. But I used NMDS over RDA or 
 CCA for a reason. It seems more straightforward to use the distances from my 
 NMDS ordination instead of generating new coordinates from a PCoA to fit an 
 RDA framework (as in db-RDA)... but this logic only works if NMDS distances 
 are informative.
 
 Are these comparable analyses? If not, why not?
 
 I'd love your opinions.
 
 Thank you,
 Kate
 
 -- 
 Kate Boersma, PhD
 Department of Biology
 University of San Diego
 5998 Alcala Park
 San Diego CA 92110
 kateboer...@gmail.com
 http://www.oregonstate.edu/~boersmak/
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] slope for rarefaction curve

2015-05-27 Thread Jari Oksanen
Just for your information: new version of vegan (2.3-0) is now available on 
CRAN, and it has function rareslope() that finds the slope of the rarefaction 
curve at given sample size. It can also calculate the slope at at intermediate 
non-integer sample sizes. Together with this, the latest vegan also adds 
similar slope calculations for analytic species accumulation models (exact, 
rarefaction, coleman) plus all non-linear regression models in 
fitspecaccum() (function specslope()).

Cheers, Jari Oksanen 
On 11/05/2015, at 12:09 PM, Jari Oksanen wrote:

 Basic algebrra seems to lead to this function:
 
 rarederatk - 
 function (x, k)
 {
   x - x[x0]
   J - sum(x)
   d - digamma(J-k+1) -digamma(J-x-k+1)
   g - lgamma(J-x+1) + lgamma(J-k+1) - lgamma(J-x-k+1)-lgamma(J+1)
   d - d*exp(g)
   sum(d[is.finite(d)])
 }
 
 Here 'x' must be a vector of species abundances for a single site, and 'k' 
 the sample site at which the 
 derivative is evaluate. Here a simple test that this seems to work (but 
 please check):
 
 library(vegan)
 data(mite)
 rarecurve(mite[1,]) # first sampling point
 sum(rarecurve(mite[1,]) # 140, evaluate at 126 individuals
 y - rarefy(mite[1,], 126) # 19.47011
 b - rarederatk(mite[1,], 126) # derivetive 0.04032 (with warnings)
 abline(y-126*b, b) # matches the rarecurve plot
 
 Cheers, Jari Oksanen
 
 From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of Zoltan 
 Botta-Dukat botta-dukat.zol...@okologia.mta.hu
 Sent: 11 May 2015 08:49
 To: r-sig-ecology@r-project.org
 Subject: Re: [R-sig-eco] slope for rarefaction curve
 
 Dear Simone,
 
 Function rarefy uses the function developed by Hurlbert, thus if you
 need slope in a certain point (as your graph suggests) you can calculate
 the derivative of this function. It is not an easy job, because
 factorials should be derived. See cues here how it can be done:
 
 http://math.stackexchange.com/questions/300526/derivative-of-a-factorial
 
 If you need mean slope in an interval, simply calculate the difference
 in the calculated values for the beginning and end of the interval, and
 divide the difference by the length of the interval.
 
 Zoltan
 
 2015.05.10. 23:57 keltezéssel, Simone Ruzza írta:
 Dear all,
 
 apologies for the total beginner's question. I was wondering if anyone
 can give some advice on how to calculate the slope for the last 10% of
 the records of a rarefaction curve computed with rarefy from vegan.
 Here is a graphic representation of what I would like to do:
 
 https://dl.dropboxusercontent.com/u/33966347/figure.JPG
 
 I have seen that this has been done in a recent paper and I was
 wondering if anyone may have any code snippet to do that. Sorry, maybe
 this is something really obvious but I have not quite understood how
 to do it.
 
 thanks!
 
 Simone
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 --
 Botta-Dukát Zoltán
 
 Ökológiai és Botanikai Intézet
 Magyar Tudományos Akadémia
 Ökológiai Kutatóközpont
 
 2163. Vácrátót, Alkotmány u. 2-4.
 tel: +36 28 360122/157
 fax: +36 28 360110
 botta-dukat.zol...@okologia.mta.hu
 www.okologia.mta.hu
 
 
 Zoltán BOTTA-Dukát
 
 Institute of Ecology and Botany
 Hungarian Academy of Sciences
 Centre for Ecological Research
 
 H-2163 Vácrátót, Alkomány u. 2-4.
 HUNGARY
 Phone: +36 28 360122/157
 Fax..: +36 28 360110
 botta-dukat.zol...@okologia.mta.hu
 www.okologia.mta.hu
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] slope for rarefaction curve

2015-05-11 Thread Jari Oksanen
Basic algebrra seems to lead to this function:

rarederatk - 
function (x, k)
{
   x - x[x0]
   J - sum(x)
   d - digamma(J-k+1) -digamma(J-x-k+1)
   g - lgamma(J-x+1) + lgamma(J-k+1) - lgamma(J-x-k+1)-lgamma(J+1)
   d - d*exp(g)
   sum(d[is.finite(d)])
}

Here 'x' must be a vector of species abundances for a single site, and 'k' the 
sample site at which the 
derivative is evaluate. Here a simple test that this seems to work (but please 
check):

library(vegan)
data(mite)
rarecurve(mite[1,]) # first sampling point
sum(rarecurve(mite[1,]) # 140, evaluate at 126 individuals
y - rarefy(mite[1,], 126) # 19.47011
b - rarederatk(mite[1,], 126) # derivetive 0.04032 (with warnings)
abline(y-126*b, b) # matches the rarecurve plot

Cheers, Jari Oksanen

From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of Zoltan 
Botta-Dukat botta-dukat.zol...@okologia.mta.hu
Sent: 11 May 2015 08:49
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] slope for rarefaction curve

Dear Simone,

Function rarefy uses the function developed by Hurlbert, thus if you
need slope in a certain point (as your graph suggests) you can calculate
the derivative of this function. It is not an easy job, because
factorials should be derived. See cues here how it can be done:

http://math.stackexchange.com/questions/300526/derivative-of-a-factorial

If you need mean slope in an interval, simply calculate the difference
in the calculated values for the beginning and end of the interval, and
divide the difference by the length of the interval.

Zoltan

2015.05.10. 23:57 keltezéssel, Simone Ruzza írta:
 Dear all,

 apologies for the total beginner's question. I was wondering if anyone
 can give some advice on how to calculate the slope for the last 10% of
 the records of a rarefaction curve computed with rarefy from vegan.
 Here is a graphic representation of what I would like to do:

 https://dl.dropboxusercontent.com/u/33966347/figure.JPG

 I have seen that this has been done in a recent paper and I was
 wondering if anyone may have any code snippet to do that. Sorry, maybe
 this is something really obvious but I have not quite understood how
 to do it.

 thanks!

 Simone

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Botta-Dukát Zoltán

Ökológiai és Botanikai Intézet
Magyar Tudományos Akadémia
Ökológiai Kutatóközpont

2163. Vácrátót, Alkotmány u. 2-4.
tel: +36 28 360122/157
fax: +36 28 360110
botta-dukat.zol...@okologia.mta.hu
www.okologia.mta.hu


Zoltán BOTTA-Dukát

Institute of Ecology and Botany
Hungarian Academy of Sciences
Centre for Ecological Research

H-2163 Vácrátót, Alkomány u. 2-4.
HUNGARY
Phone: +36 28 360122/157
Fax..: +36 28 360110
botta-dukat.zol...@okologia.mta.hu
www.okologia.mta.hu

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] vegan: goodness() and pRDA

2015-03-06 Thread Jari Oksanen
This seems to be a bug in vegan. In fact, there are both bugs and bad design 
decisions.

I have pushed a pull request to https://github.com/vegandevs/vegan which fixes 
the issues in this message plus a couple of other quirks. I'll test that fix a 
bit, and if everything looks OK, it will be merged and included in the next 
vegan release.

Thanks for reporting this.

Best wishes, Jari Oksanen

From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of 
Christoph Eberhard Freiherr von Redwitz christoph.redw...@uni-rostock.de
Sent: 05 March 2015 15:29
To: R-sig-ecology@r-project.org
Subject: [R-sig-eco] vegan: goodness() and pRDA

Hi all!
I am using rda() in vegan and goodness() as a diagnostic tool. I want to know 
how good the species fit to the first two ordination axes.
But I got problems combining partial RDA and goodness(). The species sum of CA 
and CCA don't add up to 1 (as it does in RDA without Condition term). 
Furthermore, I get warnings when only one axis is left over in the result of 
goodness.
Here an example:


#--#
data(dune)
data(dune.env)

mod1 - rda(dune~A1 + Moisture + Management, data=dune.env)
mod2  - rda(dune~ Manure + Condition(A1 + Moisture + Management), 
data=dune.env)
mod3 - rda(dune~A1 + Condition(Manure + Moisture + Management), data=dune.env)

goodness(mod1, mod= CA ,sum=T) + goodness(mod1, mod= CCA ,sum=T)#result as 
expected
goodness(mod2, mod= CA ,sum=T) + goodness(mod2, mod= CCA ,sum=T)#only 4 RDA 
axes left over, they don't end up with 1

#here comes the warning:
goodness(mod3,mod= CA ,sum=T)#works
goodness(mod3,mod= CCA,sum=T) #only 1 RDA axis left. Warning messages. The 
sum option seems to be affected.
#--#

Two questions:
Is the warning because of wrong usage of the method? Can I use goodness with 
partial RDA?
1 - goodness(mod3,mod= CA ,sum=T) should do it in this case?

I hope the mail is clear and not an old story.

Greetings!
Christoph


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] CAPSCALE plot output

2014-12-17 Thread Jari Oksanen
Alex, 

It is possible that the documentation in vegan answers this. You can try 
vegandocs(decision) and go to chapter Scaling in redundancy analysis that 
also applies to capscale results.

Cheers, Jari Oksanen

From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of 
Alexander van den Bos alexander.vanden...@hutton.ac.uk
Sent: 17 December 2014 13:02
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] CAPSCALE plot output

Hello,

I have a quick question about the plot()function in capscale. This appears to 
plot the site scores from the output, but on closer inspection I've noticed 
that the values are not exactly the same as those produced by summary()
Does anyone have an explanation for this? Perhaps there is some scaling 
performed on the data before plotting it

Thanks
Alex


If you are not the intended recipient, you should not read, copy, disclose or 
rely on any information contained in this email, and we would ask you to 
contact the sender immediately and delete the email from your system.
Although the James Hutton Institute has taken reasonable precautions to ensure 
no viruses are present in this email, neither the Institute nor the sender 
accepts any responsibility for any viruses, and it is your
responsibility to scan the email and any attachments.

The James Hutton Institute is a Scottish charitable company limited by 
guarantee.
Registered in Scotland No. SC374831
Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA.
Charity No. SC041796



[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Community distance matrix deconstruction

2014-12-12 Thread Jari Oksanen
Kate,

Your question really may need some clarification, but at the moment it looks to 
me that you want to have row indices and column indices for your 
dissimilarities, and information about within/between dissimilarities. If this 
is what you want to have, it is an easy task.

In the following I use a real data set from vegan to make this task a bit more 
general:

library(vegan)
data(mite, mite.env)
## dissimilarities
d - dist(mite)
## row and column indices
row - as.dist(row(as.matrix(d)))
col - as.dist(col(as.matrix(d)))
## within same class: 1 = within, 0 = between
within - with(mite.env, as.dist(outer(Shrub, Shrub, ==)))
## data frame -- the pedestrian way: snappier alternatives possible 
df = data.frame(row=as.vector(row), col=as.vector(col), 
within=as.vector(within), dist=as.vector(d))
## see it
tail(df)
 tail(df)
# row col within  dist
#2410  68  67  1 691.69502
#2411  69  67  0 716.93863
#2412  70  67  0 700.60973
#2413  69  68  0  78.08329
#2414  70  68  0  24.24871
#2415  70  69  1  67.86015

I don't think you really want to have this: you only believe that you want to 
have this (mauvaise foi, like they used to say).

If you only want to get summaries, check function meandist in vegan.

Cheers, Jari Oksanen



On 13/12/2014, at 02:17 AM, Kate Boersma wrote:

 Hi all.
 
 I have a community analysis data manipulation puzzle for you... hopefully 
 someone can help. Please let me know if this question needs clarification, 
 has previously been answered, or would be better sent to a different list.
 
 Details follow.
 
 Thank you,
 
 Kate
 
 ---
 
 Here is a simplified version of my problem:
 
 I ran a community manipulation experiment with 7 reps of 2 treatments, for a 
 total of 14 communities. Communities 1-7 are in Treatment 1 and 8-14 are in 
 Treatment 2. I identified 5 taxa in the 14 communities and calculated a 
 community dissimilarity matrix (14*14). Now I would like to decompose the 
 distance matrix into a dataframe with the following column headings: 
 community1s, community2s, withinORbetweenTRT, and distance. “Within or 
 between treatment” indicates if the distance is between two communities 
 within the same treatment or between the two treatments (values of 0 or 1).
 
 I did it by hand below to demonstrate, but my actual dataset has 100 
 communities so I need to figure out how to automate it...
 
 df-data.frame(cbind(1:14, 18:5, 3:16, 14:1, 16:3)) #random values
 
 dist-dist(df)
 
 distance-as.vector(dist)
 
 community1s-c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,
 
 4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,
 
 8,8,8,8,8,8,9,9,9,9,9,10,10,10,10,11,11,11,12,12,13)
 
 community2s-c(2,3,4,5,6,7,8,9,10,11,12,13,14,3,4,5,6,7,8,9,10,11,12,13,14,
 
 4,5,6,7,8,9,10,11,12,13,14,5,6,7,8,9,10,11,12,13,14,
 
 6,7,8,9,10,11,12,13,14,7,8,9,10,11,12,13,14,
 
 8,9,10,11,12,13,14,9,10,11,12,13,14,10,11,12,13,14,
 
 11,12,13,14,12,13,14,13,14,14)
 
 #now I need a column for whether or not the comparison is within treatment or
 
 #between treatments. I ordered the sites by treatment so sites 1-7 are in 
 treatment1
 
 #and 8-14 are in treatment2. 0 is within and 1 is between.
 
 withinORbetweenTRT-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,
 
 1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,
 
 1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
 
 0,0)
 
 #now I can assemble the dataframe:
 
 final.df-cbind(community1s, community2s, withinORbetweenTRT, distance)
 
 final.df
 
 I would appreciate any ideas!
 
 -- 
 Kate Boersma, PhD
 Department of Biology
 University of San Diego
 5998 Alcala Park
 San Diego CA 92110
 kateboer...@gmail.com
 http://www.oregonstate.edu/~boersmak/
 
 Kate S. Boersma, Ph.D.
 kateboer...@gmail.com
 http://people.oregonstate.edu/~boersmak/
 
 Department of Biology
 University of San Diego
 5998 Alcala Park
 San Diego, CA 92110
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] help radfit()

2014-11-29 Thread Jari Oksanen
Negative values of what? If it is AIC, it is not a problem. With AIC, only 
differences are meaningful. The observed values can be negative.

Cheers, Jari Oksanen

Sent from my iPad

 On 29.11.2014, at 19.56, Manoeli Lupatini mlupat...@gmail.com wrote:
 
 Dear all,
 
 I am working with radfit() function in vegan with a df which contains a big
 amount of zeros. When is not zero, I have numbers with comma, which
 represent proportions. I am using the family Gamma. I got negative values
 for some models. Below an example of the output:
 
  Null  Preemption   LognormalZipf  Mandelbrot
 37  694.8004  -413.17581  -565.23622  492.945299  -482.93299
 38  415.2763  -340.93024  -363.86510  469.338760  -379.28155
 39 1221.0254  -310.94620 -1490.60959 1131.261212  -342.56346
 40 1448.9174  -328.61790  -794.12383  761.840479  -439.40301
 41 1007.8350  -376.34689  -493.21855  455.072273  -486.60574
 
 Someone can explain if is a problema to get this negative values?
 
 Thanks for your help,
 
 Manoeli
 
 
 -- 
 
 Manoeli Lupatini
 
 Eng. Agrônoma
 Ms. em Ciência do Solo,
 Doutoranda em Ciência do Solo
 Programa de Pós-graduação em Ciência do Solo
 Departamento de Solos, UFSM, Santa Maria
 (55)81262295
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Minimum Number of Observations for pcaCoDa?

2014-11-21 Thread Jari Oksanen
Rich,

It seems that the robust covariance matrix (? I assume it is that) is not 
non-negative definite... 

Function robCompositios::pcaCoDa seems to use function princomp of base R (or 
its stats package) as the engine to get the principal components. If that 
function is used for raw data, it stops with error message ('princomp' can 
only be used with more units than variables) if the number of columns is 
larger than the number of rows. However, it seems that it may still be able to 
handle these cases if you use covariance matrix, and the last (ncol  nrow) 
eigenvalues will be numerically zero -- that is: the covariance matrix is 
non-negative definite. Normal covariance matrices normally satisfy this (with 
provision of numerical precision), but it seems that the robust covariance 
matrix does not.

Actually, the warning is very clear and says: n  2 * p, i.e., possibly too 
small sample size. The condition I put above was only that n  p, but this 
seems to require that the number of rows is two times higher than the number of 
columns. Because this was not case, the warning came true and you got an error. 
So yes, you need more data if you wish to use this tool. 

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org 
on behalf of Rich Shepard rshep...@appl-ecosys.com
Sent: 21 November 2014 00:08
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Minimum Number of Observations for pcaCoDa?

   The compositional data sets have few observations: 4 to 7 rows each, but
there are 5 parts (columns) for each row.

   I tried to use the robCompositions function pcaCoDa(). There was an error
and warning generated:

( winters.biplot - pcaCoDa(winters.coda) )
Error in princomp.default(xilr, covmat = cv, cor = FALSE) :
   covariance matrix is not non-negative definite
In addition: Warning message:
In covMcd(xilr, cor = FALSE) :
   n  2 * p, i.e., possibly too small sample size

   The matrix for winters.code has the counts:

   filter gather graze predate shred
1  3 27 3  11 1
2  3 28 3  13 2
3  3 43 7  15 1
4  4 54 6  24 3
5  3 26 4  22 5
6  1 39 2  18 2

   Same results with the data file winters.acomp:

  filtergather  graze   predate  shred
[1,] 0.0667 0.600 0.0667 0.244 0.0222
[2,] 0.06122449 0.5714286 0.06122449 0.2653061 0.04081633
[3,] 0.04347826 0.6231884 0.10144928 0.2173913 0.01449275
[4,] 0.04395604 0.5934066 0.06593407 0.2637363 0.03296703
[5,] 0.0500 0.433 0.0667 0.367 0.0833
[6,] 0.01612903 0.6290323 0.03225806 0.2903226 0.03225806
attr(,class)
[1] acomp

   Is there a minimum number of observations for PCA or was I using the
incorrect data format?

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Logistic regression with 2 categorical predictors

2014-10-24 Thread Jari Oksanen
Andrew,

If the 24 rows are the data you are analysing, I cannot replicate any of your 
significant results within that glm framework *if* I take into account the 
overdispersion. The full model with age*test interaction is saturated and 
cannot be analysed at all, but the main effects model age+test can be analysed 
(or either term separately). However,  the results are overdispersed, and you 
should use family=quasibinomial instead of family=binomial, and then use 
test=F instead of test=Chi:

 anova(glm(cbind(prefer,avoid) ~ age+test, data=datafromthemail, 
 family=quasibinomial), test=F)
Analysis of Deviance Table

Model: quasibinomial, link: logit

Response: cbind(prefer, avoid)

Terms added sequentially (first to last)


 Df Deviance Resid. Df Resid. Dev  F Pr(F)
NULL23 54.908  
age   5  11.235218 43.673 0.9844 0.4591
test  3   1.593415 42.079 0.2327 0.8722

As you see, the Resid. Dev is much larger than Resid. Df for both terms in this 
sequential model, and therefore you must use quasibinomial models and F-tests 
-- and these give similar results as other tests.

I could not get any results for the saturated models, and one of your examples 
(below in this thread) seemed to use only one level of test and only *six* 
observations: it was also saturated as you had no replicates for your six age 
levels. You need replicates.

However, I'm not sure I understood your data correctly. It looks like you have 
a certain number of animals, but their number is reduced with age so that you 
have a kind of censored data (animals not available in all cases). Perhaps 
somebody can propose a better analysis for such a censored data, if it is like 
that.

Cheers, Jari Oksanen

On 24/10/2014, at 10:41 AM, Andrew Halford wrote:

 Dear Gavin,
 
 Firstly let me say that I take offence at your bogus comment. Just
 because I, like many others who interact on this list, often struggle
 conceptually with the overwhelming analysis choices that are required in
 our line of work doesn't give you the right to drop snide remarks as you
 see fit.
 
 My line of query is ALWAYS genuine from my perspective and I don't expect
 you or anyone else to belittle people on the public list!
 
 As it turns out my issues are not resolved.
 
 To recap..
 
 I have run a bunch of choice chamber experiments with larval fish. Graphing
 up the ratio of 0/1 choices produces a plot which shows to my eye, evidence
 of a result for some of the tests, with fish appearing to make defined
 choices in the later age groups for 2 of the tests.
 
 What appears to be happening is that because there are some empty cells in
 the later age x test interactions (the fish only took one option to the
 exclusion of the other) the errors are way out and hence preclude any
 chance of getting a significant result. If I add a single result to any of
 the zero cells to remove the blank the analysis actually works more as I
 hoped. However I doubt this is acceptable so I am hoping to get some help
 with producing an effective analysis without having to manipulate the blank
 cells.
 
 Andrew
 
 
 
 
 On 24 October 2014 04:08, Gavin Simpson ucfa...@gmail.com wrote:
 
 This all looks bogus to me; you've fit the data perfectly by fitting a
 saturated model - there are no residual degrees of freedom and
 (effectively) zero residual deviance. Things are clearly amiss because you
 have huge standard errors. You have 24 data points and fit a model with 23
 coefficient plus the intercept; you just replaced your data with 24 new
 data points (the values in the Estimate column of the summary() output)
 
 I really wouldn't bother interpreting it any further.
 
 HTH
 
 G
 
 On 21 October 2014 18:21, Andrew Halford andrew.half...@gmail.com wrote:
 
 Hi Thierry,
 
 The multiple comparisons ran just fine but there was a ridiculous amount
 of
 interaction combinations all of which were non-significant even though
 there was a highly significant interaction term. I decided to remove test
 as a variable to simplify the analysis and run separate single explanatory
 variable logistic regressions. I have included a result below which is
 still producing an outcome I cant explain. Namely, why am I getting such a
 significant result for the ANOVA but when I do the tukey tests nothing is
 significant?
 
 sg_habitat
  Age Prefer Avoid
 1   1 1714
 2   2 2010
 3   3 14 9
 4   4 1312
 5   5  018
 6   6  0 5
 
 model_sg - glm(cbind(Prefer,Avoid) ~ Age, data=sg_habitat,
 family=binomial)
 
 anova(model_sg, test=Chisq)
 
 Analysis of Deviance Table
 
 Model: binomial, link: logit
 
 Response: cbind(Prefer, Avoid)
 
 Terms added sequentially (first to last)
 
 
 Df Deviance Resid. Df Resid. Dev  Pr(Chi)
 NULL 5 36.588
 Age   5   36.588 0  0.000 7.243e-07 ***
 
 
 mc_sg - glht(model_sg, mcp(Age = Tukey))
 
 summary(mc_sg)
 
 Simultaneous Tests

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Jari Oksanen

On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:

 On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au wrote:
 
 A good place to start is by looking at your residuals  to determine if
 the normality assumptions are being met, if not then some form of glm
 that correctly models the residuals or a non parametric method should
 be used.
 
 
 Doing that could be very tricky indeed; I defy anyone, without knowledge of
 how the data were generated, to detect departures from normality in such a
 small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean.
 
 Second, one usually considers the distribution of the response when fitting
 a GLM, not decide if residuals from an LM are non-Gaussian then move on.
 The decision to use the GLM should be motivated directly from the data and
 question to hand. Perhaps sometimes we can get away with fitting the LM,
 but that usually involves some thought, in which case one has probably
 already thought about the GLM as well.

I agree completely with Gavin. If you have four data points and fit a 
two-parameter linear model and in addition select a one-parameter exponential 
family distribution (as implied in selecting a GLM family) you don't have many 
degrees of freedom left. I don't think you get such models accepted in many 
journals. Forget the regression and get more data. Some people suggested here 
that an acceptable model could be possible if your data points are not single 
observations but means from several observations. That is true: then you can 
proceed, but consult a statistician on the way to proceed.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] SIMPER problem: invalid 'nrow' value (too large or NA)

2014-10-15 Thread Jari Oksanen

On 14/10/2014, at 21:41 PM, mastratton wrote:

 markusvlindh wrote
 Dear all,
 
 I'm having difficulty applying a SIMPER analysis found in vegan, following
 the example provided i the help function of simper. I keep receiving the
 following error message: 
 
 Error in matrix(ncol = P, nrow = n.a * n.b) : 
  invalid 'nrow' value (too large or NA)
 
 My data consist of a community matrix with 200 species and 43 dates (class
 = data.frame) and my groups consists of factors with in total 12 levels.
 
 A mock example could be the following that is working! :
 library(vegan)
 community-data.frame(replicate(43,sample(0:1000,200,rep=TRUE)))
 groups-as.factor(replicate(1,sample(c(Alpha,Beta,Gamma,Epsilon,Bact,Actino,Verr,Unclass,Cyano,Plancto,Eury,Chloro),200,rep=T))
 simper_test-simper(community,groups)
 summary(simper_test)
 
 But please see the attached files for true data that is not working.
 
 Could someone please please assist in what is the problem with my data.
 
 Kind regards!
 
 Markus,
 
 I was getting the same error message and discovered that simper() is not
 written to handle an input 'group' factor that has one or more unique values
 with only one occurrence. Your data (reattached) have two of these
 instances:.
 
...
 The source code for simper() can be modified to allow these instances:
 
Yes, the source can be modified *and* it has been modified to cope with 
one-member groups. You can install the modified version of vegan for Windows, 
or if you have programming tools for other OS's, too, using: 

install.packages(vegan, repos=http://R-Forge.R-project.org;)

This is development version, but we are going to release new CRAN version of 
vegan later this month, and the R-Forge version is very close to the release 
version. (Looks like the build system is down again in R-Forge, and has been 
for a week, so that this is not the most recent and stable version of R-Forge, 
but it is mostly safe.)

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [vegan]Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' when perfroming a cca on a dataframe

2014-10-13 Thread Jari Oksanen
Dear Tim Richter,

Some minimal information is the version of vegan you are using: we have just 
redesigned the permutation functions in github and R-Forge and it is essential 
to know where to look at (and we are not too keen to look at the code that will 
be thrown away in the next release later within a few days).

Secondly, you could give us a traceback() of the error. When you get the error, 
just type:

traceback()

copy and paste the message and send it to us. This at least tells us where you 
get the error.

Thirdly, this is a question specific to the vegan package and may not be too 
exciting for the general public. You may consider contacting the package 
maintainers directly. 

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org 
on behalf of Tim Richter-Heitmann trich...@uni-bremen.de
Sent: 13 October 2014 16:49
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] [vegan]Error in as.vector(x, mode) : , cannot coerce type 
'builtin' to vector of type 'any' when perfroming a cca on a dataframe

Hi there,

i have three dataframes, which have the same structure (data.frame, 358
observations of 340 variables, every column is numeric), and are
basically submatrices of a parental dataset.

I can perform a CCA on all of them (with the explanatory dataset being
abio)

/cca1_ab - cca(df1~., data=abio)/

I can also perform an anova on all three cca-derived values

/anova(cca1_ab)/

However, if i do

/sig1_ab-anova(cca1_ab, by=term, perm=200)/

which i believe gives me confidence values for the fitting of the
explanatory variables, only one of the cca-derived values gives an
unexpected

Error in as.vector(x, mode) : ,  cannot coerce type 'builtin' to vector
of type 'any'

The same is true when i try to perform an ordistep forward selection.
The other two are working fine. Its hard to come up with an explanation,
as the datasets look and behave the same otherwise.

Any idea why this could happen?

I am sorry for not providing data for reproduction, as the data sets are
pretty large.

--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Stra�e (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] predict NMS scores for new samples

2014-10-12 Thread Jari Oksanen

On 12/10/2014, at 23:13 PM, Dave Roberts wrote:

 Hi Jonathan,
 
   If you're using the metaMDS function in vegan with the monomds engine then 
 it's possible.  I have posted a function (monomds)  at the bottom of
 
 http://ecology.msu.montana.edu/labdsv/R/labs/lab9/lab9.html
 
 that shows how to generate a labdsv:::nmds object from vegan's monomds 
 function (courtesy of Peter Minchin).  You will have to have package vegan 
 loaded to get the monomds FORTRAN code loaded.
 Then you can use the function addpoints.nmds (also at the bottom of that 
 page) to add points to an existing nmds.
 
   I'm not convinced that it's a good idea, but I worked with someone who 
 needed it to fulfil contract obligations so I wrote it.
 
It looks like a very good idea to me.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] adonis() and collinearity

2014-09-17 Thread Jari Oksanen
Dear Jonas Kuppler,

On 17/09/2014, at 16:06 PM, Jonas Kuppler wrote:

 
 if I use only continuous explanatory variables in the adonis() function it is 
 like a multiple linear regression with dissimilarities.
 
 In multiple regression with highly correlated exp. variables I have the 
 problem with multicollinearity and an increasing standard error of the 
 coefficents. Is it the same for the adonis() function?

Yes, it is.

 I would think so since adonis() is analogous to a MANOVA, but I am not 
 sure. And is there any possibility to estimate the influence of 
 multicollinearity in adonis(); like the variance inflation factor for lm?
 
Not directly: adonis does not return all intermediate results that are useful 
to find the vif. It could be modified to return those items and then it would 
be very simple to calculate vif. Not yet, though (neither in plans). There are 
some tricks that may work even with the current code: check Legendre  Legendre 
latest edition.

Cheers, Jari Oksanen
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] simple question about CCA

2014-09-11 Thread Jari Oksanen
Simone, 

If you continue your first example and *look* at the result object, you'll see:

 ccatest
Call: cca(formula = dat[8:144] ~ x1 + x2 + x3 + x4 + x5 + x6 + x7, data
= dat)

  Inertia Proportion Rank
Total   13.76   1.00 
Constrained 13.76   1.00   62
Unconstrained0.00   0.000
Inertia is mean squared contingency coefficient 
Some constraints were aliased because they were collinear (redundant)

This shows you, among other things, that the number of constraints is higher 
than the number of observations: there is no residual variation (Unconstrained 
Inertia = 0), and Some constraints were aliased because they were collinear 
(redundant). You won't see those aliased constraints, and therefore the last 
ones are dropped. In this case, the aliased constraints are:

 alias(ccatest, names=TRUE)
[1] x6SCL x6SL  x6ZC  x6ZCL x7

That is, four levels of x6 and x7. These are not shown. If you change the order 
of the constraints, some other variables may be among those last four ones that 
are not shown and cannot be analysed.

You need either more data (more observations) or a more sensible model with 
fewer constraints. The first way (collect more data) is more heroic, but the 
second is more clever.

If you look at your data (dat), you see that x5 is a factor with 54 levels and 
x6 is a factor with 10 levels. You have 63 observations, and these two together 
with 64 levels are able to completely explain everything and anything in these 
data: you run out of degrees of freedom.

Sorry for top-posting and bad formatting: this MS Outlook.

cheers, Jari Oksanen



From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org 
on behalf of Simone Ruzza simone.ruzz...@gmail.com
Sent: 11 September 2014 13:24
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] simple question about CCA

Dear all,

apologies for the simplicity of my question, maybe it has been asked
many times, but I am a total novice to CCA. I have performed a CCA
using a series of environmental variables that comprise a mixture of
categorical and non-categorical variables. What I do not understand is
why when I change the order of my variables and I plot the results, a
variable disappears from the CCA biplot i.e. the last one being
continuous variable. I realised that there might a very simple
question, so I would be happy even with a reference where to find an
answer. Below some code showing what is happening.

thanks in advance,

Simone



require(RCurl)
require(vegan)
x - getURL(https://dl.dropboxusercontent.com/u/33966347/testdata.csv;)
dat- read.csv(text = x)


# example 1 x7 disappear from the plot (note that x5 and x6 are categorical)
ccatest-cca(dat[8:144]~x1+x2+x3+x4+x5+x6+x7,data=dat)
plot(ccatest)

# example 2 x7 is present in the plot
ccatest1-cca(dat[8:144]~x7+x1+x2+x3+x4+x5+x6,data=dat)
plot(ccatest1)

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Manual Rarefaction and CI's

2014-05-28 Thread Jari Oksanen
Nicholas,

Your approach looks perfectly OK. However, rarefied sampling is normally done 
*without* replacement. 

It looks to me that a similar analysis could be done with vegan::specaccum 
function. However, that uses sampling without replacement.

About your specific questions: you can use quantile() function in base R to get 
desired confidence intervals. The CIs should be given as argument probs to 
quantile(). For quantile function you should return a vector of richness values 
instead of a table. You can get similar with table output as well, but you may 
need to write that yourself.

It is excepted that the CI gets narrower as the subsample size increases. In 
classic rarefaction sampling without replacement, there is only one subsample 
of the original sample size and therefore there is no variance but you get the 
observed data. As the subsampled proportion approaches 1, the number of 
possible samples decreases and so does the variance of subsamples. With 
replacement this may not happen as some units are there twice or more and some 
are missing. Subsample of original size with replacement should contain 63.2% 
of your original units. This also means that because you duplicate (or 
multiplicate) some sampling units and omit some, you will systematically 
underestimate richness when you subsample with replacement. This is why we 
normally use subsampling without replacement. However, the larger subsamples 
tend to become more similar, in particular with data set like you described. 
This is necessary and correct. 

This is the most annoying thing in CIs in rarefaction to me. The rarefaction 
variation *only* describes the randomness of subsampling from a fixed sample. 
It does not describe the variance of that fixed sample, or the real variance of 
your observed richness. With real variability I mean the variance you observe 
if you replicate your sampling in Nature and get completely new, independent 
samples from the same real community. I am afraid people use those rarefaction 
CIs and variances as measures of sample variability which is really misleading 
-- they actually describe only the artifactual variance of subsampling from 
fixed samples, when these fixed samples are regarded as error-free and to have 
zero variance of species richness. Using these variances to infer differences 
in species richness in Nature is wrong and misleading. I have tried to spell 
out that warning in vegan manual pages, but people seem to ignore those parts 
of the text. 

cheers, Jari Oksanen

On 28/05/2014, at 20:05 PM, Nicholas J. Negovetich wrote:

 I have a question related to rarefaction of our samples.  Unlike all of the 
 examples of which I'm aware, I'm not working with 'sites', per se.  Instead, 
 I'm working with the parasites of snails.  Snails are infected with 1 
 parasite species, and each pond/stream can hold several parasite species.  
 The difficulty comes when comparing parasite richness between sites (Pond A 
 vs Pond B) and sampling effort (# snails collected) differs.  Rarefaction 
 provides a means to standardize effort on the lowest sample size so that we 
 can test if parasite richness differs between sites.
 
 I'm familiar with the vegan package and how it performs rarefaction.  But, I 
 can't perform this type of analysis because (1.) 'uninfected' snails would be 
 considered empty patches, and (2.) max richness will be 1 (only one parasite 
 species can (usually) infect a snail).  To counter this problem, I've tried 
 to rarefy my samples using randomization methods.  The script is below.  In 
 summary, I sampled (with replacement) my dataset and calculated the number of 
 parasite species from each sample.  I repeated this 100 times and calculated 
 the mean richness for a given sample size.  I did this for sample sizes 1-50 
 (smallest sample size is 50 individuals).
 
 I have a few questions related this this analysis.
 1.  Does this appear correct?
 
 2.  How can I generate CI using this method?  I normally calculate the 2.5% 
 and 97.5% quantiles when performing randomization, but the nature of our data 
 does not lend itself well to quantile calculations.  Could I assume that my 
 bootstrapped means follow a normal distribution?  If not, then what would be 
 the best method for generating confidence intervals?
 
 3.  This last one is the primary reason for this post.  When performing this 
 analysis on my real datasets, I've noticed a peculiar thing.  In one sample, 
 my variance of the bootstrapped samples converges to zero as sample size 
 approaches the true sample size (this is for my sample of the smallest sample 
 size). Relatedly, I've noticed some (but not all) of my CI narrowing as my 
 sample size approaches the actual size.  This suggests that I'm not doing 
 something correctly, but I'm not sure how to correct it (or if I even need to 
 correct it).  The alternative for me is to scrap the rarefaction and perform 
 an alternative analysis, such as randomizing

Re: [R-sig-eco] how are compuetd the species scores of pca from veganpackage ?

2014-05-27 Thread Jari Oksanen
Dear Claire Della Vedova,

It seems that you have searched in many places, except in vegan documentation. 
Look at the vignette on Design decisions, section Scaling in redundancy 
analysis.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org 
on behalf of claire della vedova claire.della-ved...@magelisgroup.com
Sent: 27 May 2014 11:17
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] how are compuetd the species scores of pca from 
veganpackage ?

Hi everybody,

I'm working on PCA approach, and comparing outputs from ade4 and vegan
packages.
I'm ok with the normalization of the variables coordinates coming from ade4
outputs.
(with $co : coordinates are scaled to eigen values ; with  $c1 : coordinates
are scaled to 1).

But I have difficulties to understand how are computed the Species scores in
vegan's outputs with scaling 1 or 2 options, and what means  the message
concerning scaling, especially about the 'General scaling constant of
scores'.
For example :

'Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
* General scaling constant of scores:  4.226177 '
'

I've search on the archives of R-sig-ecology , cross validate and
stackoverflow, and found nothing that helped me. If somebody has some
information about it, I would greatly appreciate some help.
All the best.

Claire Della Vedova

Here some parts of my code :


library(ade4)
library(vegan)

doubs.env - read.csv
('http://www.sci.muni.cz/botany/zeleny/wiki/anadat-r/data
   -download/DoubsEnv.csv', row.names = 1)

## with ade4 ##
pca.ad-dudi.pca(doubs.env, scale = TRUE, center = TRUE, scann = FALSE,nf=3)

# eigen value of the fisrt eigen vector
pca.ad$eig[1]
[1] 5.968749

#variables coordinates in first eigen vector
pca.ad$co[,1]
[1]  0.85280863 -0.81918008 -0.4528  0.75214647 -0.04996375  0.70722171
 [7]  0.83048310  0.90260821  0.79011263 -0.76485397  0.76373149


#check the normalization of laodings
sum(pca.ad$co[,1]^2)
[1] 5.968749
#= coordinatesscaled to eigen values

#variables normed scores in first eigen vector
pca.ad$c1[,1]
[1]  0.34906791 -0.33530322 -0.18535177  0.30786532 -0.02045094  0.28947691
 [7]  0.33992972  0.36945166  0.32340546 -0.31306670  0.31260725


#check the normalization
sum(pca.ad$c1[,1]^2)
[1] 1
#= coordinates scaled to 1

## with vegan ##
pca.veg-rda(doubs.env, scale = TRUE)

# species scores for the fisrt eigen vector, with sacling 1
summary(pca.veg, scaling=1)
  dasaltpendeb pHdurpho
 1.4752228 -1.4170507 -0.7833294  1.3010933 -0.0864293  1.2233806  1.4366031
   nitammoxydbo
 1.5613681  1.3667687 -1.3230752  1.3211335

'Scaling 1 for species and site scores
* Sites are scaled proportional to eigenvalues
* Species are unscaled: weighted dispersion equal on all dimensions
* General scaling constant of scores:  4.226177
'



summary(pca.veg, scaling=2)[1][[1]][,1]
das alt pen deb  pH dur
1.08668311 -1.04383225 -0.57701847  0.95841533 -0.06366582  0.90117039
pho nit amm oxy dbo
1.05823501  1.15013974  1.00679334 -0.97460773  0.97317743
'Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
* General scaling constant of scores:  4.226177 '




--
View this message in context: 
http://r-sig-ecology.471788.n2.nabble.com/how-are-compuetd-the-species-scores-of-pca-from-vegan-package-tp7578918.html
Sent from the r-sig-ecology mailing list archive at Nabble.com.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Measurement distance for proportion data

2014-05-13 Thread Jari Oksanen
Typical dissimilarity indices are of form difference/adjustment, where the 
adjustment takes care of forcing the index to the range 0..1, and handles 
varying total abundances / richnesses. If you have proportional data, you may 
not need the adjustment at all, but you can just use any index. That is, it 
does not matter so awfully much what index you use, and for many practical 
purposes it does not matter if data are proportional. Actually, several indices 
may be equal to each with with proportional data. For instance, Manhattan, 
Bray-Curtis and Kulczynski indices are all identical. All you need to decide is 
which name you use for your index -- numbers do not change.

The analysis of proportional data usually covers very different classes of 
models than ANOSIM and friends. Dissimilarities are not usually involved in 
these models. One aspect in proportional data is that only M-1 of M variables 
really are independent. However, this really needs to be taken into account if 
M is low. I have no idea how is that in your case. 

Cheers, Jari Oksanen
On 13/05/2014, at 15:32 PM, Zbigniew Ziembik wrote:

 I am not sure, but it seems that your problem is related to
 compositional data analysis. You can probably use Aitchison distance to
 estimate separation between proportions.
 Take a (free) look at:
 http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf.
 http://dugi-doc.udg.edu/bitstream/10256/297/1/CoDa-book.pdf.
 
 or (commercial):
 Aitchison, J. 2003. The Statistical Analysis of Compositional Data. The
 Blackburn Press.
 
 Best regards,
 ZZ
 
 
 Dnia 2014-05-12, pon o godzinie 16:37 +, Javier Lenzi pisze:
 Dear all, 
 I'm doing data exploration on seabirds trophic ecology data and I am using 
 ANOSIM to evaluate possible differences in diet during breeding and 
 non-breeding seasons. As starting point I am using some classical indexes 
 such as %FO (relative frequency of occurrence), N (number of prey counted in 
 the pooled sample of pellets), %N (N as a percentage of the total number of 
 prey of all food types in the pooled sample), V (total volume of all prey in 
 the pooled sample), and IRI (index of relative importance). 
 I have a concern on which similarity meassurement should I use in ANOSIM for 
 those indexes that are proportions.. I am aware that for instance 
 Bray-Curtis is used for count data (e.g. N) and Jaccard is used for 
 presence-absence data (which I don't have), however I did not find a proper 
 distance measurement for proportion data. Please, could you help me to find 
 a proper distance measurement for these proportion data? 
 Thank you very much in advance. Regards,Javier Lenzi 
   
  [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Measurement distance for proportion data

2014-05-13 Thread Jari Oksanen
Typical dissimilarity indices are of form difference/adjustment, where the 
adjustment takes care of forcing the index to the range 0..1, and handles 
varying total abundances / richnesses. If you have proportional data, you may 
not need the adjustment at all, but you can just use any index. That is, it 
does not matter so awfully much what index you use, and for many practical 
purposes it does not matter if data are proportional. Actually, several indices 
may be equal to each with with proportional data. For instance, Manhattan, 
Bray-Curtis and Kulczynski indices are all identical. All you need to decide is 
which name you use for your index -- numbers do not change.

The analysis of proportional data usually covers very different classes of 
models than ANOSIM and friends. Dissimilarities are not usually involved in 
these models. One aspect in proportional data is that only M-1 of M variables 
really are independent. However, this really needs to be taken into account if 
M is low. I have no idea how is that in your case. 

Cheers, Jari Oksanen
On 13/05/2014, at 15:32 PM, Zbigniew Ziembik wrote:

 I am not sure, but it seems that your problem is related to
 compositional data analysis. You can probably use Aitchison distance to
 estimate separation between proportions.
 Take a (free) look at:
 http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf.
 http://dugi-doc.udg.edu/bitstream/10256/297/1/CoDa-book.pdf.
 
 or (commercial):
 Aitchison, J. 2003. The Statistical Analysis of Compositional Data. The
 Blackburn Press.
 
 Best regards,
 ZZ
 
 
 Dnia 2014-05-12, pon o godzinie 16:37 +, Javier Lenzi pisze:
 Dear all, 
 I'm doing data exploration on seabirds trophic ecology data and I am using 
 ANOSIM to evaluate possible differences in diet during breeding and 
 non-breeding seasons. As starting point I am using some classical indexes 
 such as %FO (relative frequency of occurrence), N (number of prey counted in 
 the pooled sample of pellets), %N (N as a percentage of the total number of 
 prey of all food types in the pooled sample), V (total volume of all prey in 
 the pooled sample), and IRI (index of relative importance). 
 I have a concern on which similarity meassurement should I use in ANOSIM for 
 those indexes that are proportions.. I am aware that for instance 
 Bray-Curtis is used for count data (e.g. N) and Jaccard is used for 
 presence-absence data (which I don't have), however I did not find a proper 
 distance measurement for proportion data. Please, could you help me to find 
 a proper distance measurement for these proportion data? 
 Thank you very much in advance. Regards,Javier Lenzi 
   
  [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] calculating standard error of coefficients from adonis model

2014-05-08 Thread Jari Oksanen
Rafter,

I am not Gavin, but I step in and speak for him. 

Gavin quite clearly suggested that you bootstrap adonis. The boot package is 
intended to generate bootstrap samples and to estimate the bootstrap confidence 
limits for *any* function. The boot package does not contain those functions, 
but you can use it to analyse adonis.

At the moment, I have no idea how to directly calculate the SEs you asked for, 
and even if I had, I am sure that bootstrapping would give more reliable 
estimates. We need to make very strong and very wrong assumptions if we try to 
estimate those SEs directly within adonis code, and bootstrapping is certainly 
better. What is sure is that the equations you gave in your message won't work.

I have never used these adonis coefficients, and I have no idea how to use 
them, and therefore I don't have any opinion about their use.

Cheers, Jari Oksanen
On 09/05/2014, at 03:02 AM, Rafter wrote:

 Hi Gavin,
 
 Thanks so much for your reply. By that do you mean move away from adonis,
 and bootstrap a more conventional model that calculates parameter standard
 errors?
 
 Adonis has been the choice for me because of its ability to handle
 multivariate, distribution-independent data in x and y. If you're suggesting
 using another model, do you have any suggestions about type? 
 
 If you're not suggesting moving away from adonis, would perhaps slightly
 unpack your suggestion?
 
 Thanks very much for your time and consideration.
 
 Warmly,
 Rafter
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/calculating-standard-error-of-coefficients-from-adonis-model-tp7578878p7578882.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] residuals in RDA, and test for spatial autocorrelation

2014-04-01 Thread Jari Oksanen
Dear Tracy,

You can see if Helene Wagner's mso() in function in vegan satisfies your needs 
for analysing spatial dependence. Reference and further description in ?mso.

Cheers, Jari Oksanen
On 02/04/2014, at 00:12 AM, Pinney, Tracy A wrote:

 Hello List,
 
 I have two questions.
 
 
 1.)How do I generate a matrix of residuals in RDA (I am using rda() in 
 vegan)?
 
 2.)How can I test for spatial dependence (spatial autocorrelation) in the 
 response matrix of a multivariate analysis (RDA in my case)?  I have read 
 that you can look at the trace of the variogram matrix as a multivariate 
 alternative to Moran's I, but I don't know the details of how to make this 
 test happen using R.
 
 Thanks for any help!
 
 Tracy
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] cross validation in CoCA and CCA

2014-03-29 Thread Jari Oksanen
Jesse,

I do not know what you mean with CV in this context, but basic cross validation 
can be done with vegan functions cca(), rda() and capscale(). These functions 
have predict method that accepts 'newdata', and using new data allows cross 
validation. They also have a calibrate() function that can directly estimate 
the values of predictor values from community data, and this also has 
'newdata'. So you can build (train) a model and then use independent 
'newdata' to use (test) the model. However, we do not have any generic 
crossvalidate(object, data, k, …) function for canned cross validation process, 
but you have to do this by hand. Neither do we have any functions for multistep 
CV or structured CV where some of the external variables were known and others 
predicted/calibrated. However, basic facilities for hand crafting such models 
are provided. Simple things, like k-fold cross validation are really trivial to 
build with ordination (but if you build in the uncertainty of model building in 
the process --- like you should --- you must be very careful in collecting the 
data as the variables can change in cross validation).

Here one 5-fold CV cycle with rda:

library(vegan)
data(mite, mite.env)
## 5-fold CV
k - rep(1:5, len=nrow(mite))
## x is matrix to collect predictions for two vars
x - matrix(NA, nrow=nrow(mite), ncol=2)
colnames(x) - c(SubsDens, WatrCont)
## shuffle for each CV
k - sample(k)
## the next line could be broken into several commands within {}
for(i in 1:5) x[k==i,] - calibrate(rda(decostand(mite, hell) ~ 
SubsDens+WatrCont, mite.env, subset = k != i), newdata = decostand(mite[k==i,], 
hell))

Easy, but not very good a prediction (cca would be marginally better, like it 
usually is).

Cheers, Jari Oksanen

On 29/03/2014, at 04:44 AM, Gavin Simpson wrote:

 In short, no. I haven't ported the rough code for LOO CV of CCA or
 CCA-PLS models. I think I ported the mean centring and crossval
 functions from the Matlab sources, but not the code in the
 `example_crossvalCCA.m` file from the supplementary materials on the
 CoCA paper in Ecology.
 
 I could take a look and see how easy i will be to add this, but it
 doesn't sit well with cocorresp or vegan as the former was designed
 really for CoCA and the latter doesn't have the other functionality
 needed (which exists in cocorresp) and we've not really implemented CV
 for ordination methods.
 
 That said, this is R and it is relatively trivial to write your own
 LOO or k-fold CV loop, and you can predict from a CCA model using the
 `predict()` method for cca objects available in vegan.
 
 Part of the reason, at least as far as I see things, for not having CV
 in the common ordination software (closed or open source) is that
 these methods tend not to be seen as purely predictive models, which
 is what CV is designed to evaluate.
 
 Don't hold your breath for me getting this in cocorresp, but if you
 want to follow up I might be persuaded to take a look and see if what
 is already in cocorresp will enable you to follow the code in the
 `example_crosvalCCA.m` file to write your own LOO code.
 
 HTH
 
 G
 
 On 28 March 2014 14:57, Jesse Becker jcbecke...@gmail.com wrote:
 Hello list,
 I am doing a concordance study between riverine environmental conditions,
 invertebrate, and fish assemblages.  I am doing a predictive CoCA as part
 of the analysis with the cocorresp package.  My question is whether there
 is an implementation of the cross-validation procedure in the cocorresp
 package that would work on the results of a CCA or RDA, without having to
 use MATLAB (which I don't have access to)?  My understanding is that by
 doing the cross validation on the CCA (and hopefully RDA, although I've
 never seen it done) it allows for a more consistent evaluation of
 differences between the two methods.  I haven't seen this as a function in
 vegan.
 
 Jari?  Gavin?
 
 Thanks,
 Jesse
 
 
 Jesse C. Becker, Ph.D.
 765.285.8889765.285.8889 office
 512.587.4428512.587.4428 cell
 jcbec...@bsu.edu
 jcbecke...@gmail.com
 
 Call
 Send SMS
 Add to Skype
 You'll need Skype CreditFree via SkypeI am
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 
 -- 
 Gavin Simpson, PhD
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER

2014-03-27 Thread Jari Oksanen
Patches welcome. It is best to use vegan in GitHub.

Pairwise tests are not high on my TODO list, because they are so much against 
what I've learnt from statistical theory and I detest tests.

Cheers, Jari Oksanen


Sent from my iPad

 On 27.3.2014, at 17.54, Gavin Simpson ucfa...@gmail.com wrote:
 
 No, that will just consider the dispersions *about the centroids* not
 location shifts of the centroids. The latter is what `adonis()` does,
 but we don't have pairwise comparisons (with/without permutation test)
 there or the Tukey post-hoc tests. I suppose we *could* automate the
 process that Steve suggests, just as I automated it for
 `betadisper()`, and I think this has been raised before, but it hasn't
 risen to the top of anyone's TODO list yet to actually see it
 implemented. Patches welcome :-) !
 
 G
 
 On 27 March 2014 06:55, Johannes Björk bjork.johan...@gmail.com wrote:
 Hi,
 
 For that I believe you can run TukeyHSD.betadisper... to getting significant 
 values between levels. see ?TukeyHSD.betadisper
 
 Cheers,
 On Mar 27, 2014, at 1:47 PM, Brandon Gerig wrote:
 
 Hi Steve,
 
 Yes, this is precisely what I am interested in doing. It seems like
 betadisper might be a good way to visualize differences/similarities in the
 dispersion and examine differences among centroids for the levels within a
 factor. Am I correct in thinking that if I conduct additional PERMANOVA
 tests on a reduced data set, I could be evaluating differences between the
 levels of a main effect?
 
 Could anyone provide a citation for a paper that uses a similar procedure?
 
 
 On Wed, Mar 26, 2014 at 3:21 PM, Steve Brewer jbre...@olemiss.edu wrote:
 
 Brandon,
 
 Are you asking if you can use betadisper as a substitute for post-anova
 pairwise comparisons among levels? After using betadisper to obtain
 dispersions, I believe you can plot the centroids for each level. In
 addition to telling you if the dispersions differ among levels, you could
 see how the centroids differ from one another. Is this what you want to
 know? If so, realize that it won't give you pairwise significance tests
 for differences between levels. For that, you might want to do additional
 permanovas on reduced datasets containing only the two levels you want to
 compare. You could then adjust the p-values for multiple tests after the
 fact.
 
 Hope this helps,
 
 Steve
 
 
 J. Stephen Brewer
 Professor
 Department of Biology
 PO Box 1848
 University of Mississippi
 University, Mississippi 38677-1848
 Brewer web page - http://home.olemiss.edu/~jbrewer/
 FAX - 662-915-5144
 Phone - 662-915-1077
 
 
 
 
 On 3/26/14 10:57 AM, Brandon Gerig bge...@nd.edu wrote:
 
 Thanks for the words of caution on simper.
 
 Am I completely off base in thinking that betadiver function (analgous to
 Levene's test) could be used to examine variation between levels within
 main effects?
 
 Cheers
 
 
 On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote:
 
 I am assessing the level of similarity between PCB congener profiles in
 spawning salmon and resident stream in stream reaches with and without
 salmon to determine if salmon are a significant vector for PCBs in
 tributary foodwebs of the Great Lakes.
 
 My data set is arranged in a matrix where the columns represent the
 congener of interest and the rows represent either a salmon (migratory)
 or
 resident fish (non migratory) from different sites.  You can think of
 this
 in a manner analogous to columns representing species composition and
 rows
 representing site.
 
 Currently, I am using the function Adonis to test for dissimilarity
 between fish species, stream reaches (with and without salmon) and lake
 basin (Superior, Huron, Michigan).
 The model statement is:
 
 
 m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutation
 s=999)
 
 The output indicates significant main effects of FISH, REACH, and BASIN
 and significant interactions between FISH and BASIN, and BASIN and
 REACH.
 
 Is it best to then interpret this output via an NMDS ordination plot or
 use something like the betadiver function to examine variances between
 main
 effect levels or both?
 
 Also,  can anyone recommend a procedure to identify the congeners that
 contribute most to the dissimilarity between fish, reaches, and
 basins?. I
 was thinking the SIMPER procedure but am not yet sold.
 
 Any advice is appreciated!
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame
 
 
 
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame
 
 [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame
 
  [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing

Re: [R-sig-eco] report out by t.test

2014-03-24 Thread Jari Oksanen
Except that t-test does not assume that *observations* are normally 
distributed, nor that variances are equal.

Avoid non-parametric tests: they assume too much of data properties.

For var.equal assumption in t.test, see ?t.test.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Richard Boyce [boy...@nku.edu]
Sent: 24 March 2014 13:23
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] report out by t.test

Mike,

There is no way that your data meet the assumptions of a t-test (normal 
distributions, equal variance). A nonparametric Mann-Whitney (aka Wilcoxon) 
test is much better suited to your data.

Here's what I got when I ran it:

Q-c(13,0,10,2,0,0,1,0,0,1,5)
WD-c(0,0,1,0,0,0,0,0,0,0,1)
wilcox.test(Q,WD)

Wilcoxon rank sum test with continuity correction

data:  Q and WD
W = 86.5, p-value = 0.05119
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(Q, WD) : cannot compute exact p-value with ties

This has a p-value quite close to 0.05, giving some evidence that there's a 
difference between your groups. Note that this you have different null and 
alternative hypothesis: groups are the same vs. groups are different.

Rick Boyce

On Mar 24, 2014, at 7:00 AM, 
r-sig-ecology-requ...@r-project.orgmailto:r-sig-ecology-requ...@r-project.org 
wrote:

Message: 1
Date: Sun, 23 Mar 2014 14:21:41 -0700
From: Michael Marsh sw...@blarg.netmailto:sw...@blarg.net
To: r-sig-ecology@r-project.orgmailto:r-sig-ecology@r-project.org
Subject: [R-sig-eco] report out by t.test
Message-ID: 532f5065.7030...@blarg.netmailto:532f5065.7030...@blarg.net
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

I test differences between frequency of hits of exotic annual forbs in
plots on  two sites, Q and WD.

Q-c(13,0,10,2,0,0,1,0,0,1,5)
WD-c(0,0,1,0,0,0,0,0,0,0,1)
t.test(Q,WD)

Welch Two Sample t-test

data:  Q and WD
t = 1.9807, df = 10.158, p-value = 0.07533
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3342006  5.7887460
sample estimates:
mean of x mean of y
2.9090909 0.1818182

The p-value is greater than 0.05, thus does not reach the 95% confidence
level, yet the difference in means is reported as not equal to 0.
Am I encountering a one-sided versus two sided comparison that I don't
understand, or is ther another explanation?

Mike Marsh





Richard L. Boyce, Ph.D.
Director, Environmental Science Program
Professor
Department of Biological Sciences, SC 150
Northern Kentucky University
Nunn Drive
Highland Heights, KY  41099  USA

859-572-1407 (tel.)
859-572-5639 (fax)
boy...@nku.edumailto:boy...@nku.edu
http://www.nku.edu/~boycer/
=

One of the advantages of being disorderly is that one is constantly making 
exciting discoveries. - A.A. Milne






[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Extract residuals from adonis function in vegan package

2014-03-18 Thread Jari Oksanen
Dear Alicia Valdés, 

On 18/03/2014, at 13:53 PM, Alicia Valdés wrote:
 
 My problem is that I cannot figure out how to get residual values from the
 adonis model.
 
You cannot get residuals from the output of adonis(). 

We could change the function so that this is possible, but the current function 
does not return information for getting residuals. Neither would they be 
residuals in the traditional meaning of the word as we are dealing with 
dissimilarities or distances, and these cannot be negative. We got to discuss 
this with vegan developers.

Cheers, Jari Oksanen
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Extract residuals from adonis function in vegan package

2014-03-18 Thread Jari Oksanen

On 18/03/2014, at 15:23 PM, Eduard Szöcs wrote:

 Dear Alicia and Jari,
 
 just a thought:
 Couldn't be capscale or betadisper be used for this?
 - To obtain the distances to the group centroid?
 
 But than: How to convert this from distances to abundances?
 
You can *almost* do this with capscale(), but not quite: for semimetric 
dissimilarities the results are not identical with capscale (they are identical 
with metric distances). The capscale() function also has fitted() and 
residuals() methods that both return dissimilarities. Now it also depends on 
what you mean with residuals. The capscale() interpretation and the one I had 
on my mind is that 

1) adonis(fitted(adonis(y ~ model)) ~ model) should give distances where the 
fitted part of adonis(y ~model) and the residual variation part should be null, 
and

2) adonis(residuals(adonis(y ~model) ~ model) should give distances where fit 
would be null and residual similar as in the original adonis(y ~ model).

It would be possible to develop such functions, but not with the current 
adonis() output. You can approximate both of these with capscale() and its 
fitted() and residuals() methods, but not exactly. 

The ecodist package of Sarah Goslee takes a different approach, and could 
return something usable (but I do not know that package very well).

What really is needed depends on what you mean with residuals. Should they be 
dissimilarities (which cannot be negative) or straightforward residuals (which 
have an average of zero and some of which are negative).

Cheers, jari Oksanen


 

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] How to accommodate data with negative values for Canonical Correspondence Analysis in R using vegan Package

2014-02-26 Thread Jari Oksanen
It depends *where* to include negative values. Negative values are OK as 
constraints (environmental variables, right hand side of the model formula). 
However, all marginal sums of the response data (left hand side of the model 
formula) must be above zero. It is technically possible to have some negative 
entries in the response matrix as long as the marginal sums are positive, but 
you really should not have them. *CA family of methods were originally 
developed for non-negative data, and having negative entries usually indicates 
that your data are not at all suitable for the method. If you insist on the 
analysis, then you really must know what you are doing, but if you really know, 
you do not need ask in R-sig-ecology.

I still repeat: you can have negative data as constraints. If that fails, then 
something else is wrong with your data.

If you want to use CCA family of methods for negative response data, then RDA 
with equal scaling of variables is usable. Very commonly negative values also 
indicate that your response variables were measured in different scales and 
units, and therefore you must set scale=TRUE in the rda() call. Not knowing the 
data or any other details, this is a blind watchmaker recommendation.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Ivailo [ubuntero.9...@gmail.com]
Sent: 26 February 2014 16:23
To: Rajendra Mohan panda
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] How to accommodate data with negative values for 
Canonical Correspondence Analysis in R using vegan Package

On Wed, Feb 26, 2014 at 2:43 PM, Rajendra Mohan panda
rmp.iit@gmail.com wrote:
...
 I have temperature data with negative values which I am not able to include
 for my CCA ordination. ...

Rajendra, I am curious -- why are you not able to include the negative
values in the CCA ordination?

--
The cure for boredom is curiosity. There is no cure for curiosity. --
Dorothy Parker

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] cca

2013-12-20 Thread Jari Oksanen

On 20/12/2013, at 10:15 AM, Mahnaz Rabbaniha wrote:

 Dear all
 
 I have done cca in vegan package,
 
 bio file are the zooplankton genus
 m file is the hydro biological groups
 
 
 vare.cca - cca(bio,m)
 
 plot(vare.cca)
 
 
 the main question is about this plot, in the plot only we see the
 hydrobiological variables, is there any way or code in plot that also show
 the names of  zooplankton groups?
 
Dear Mahnaz Rabbaniha,

Have you tried plot(vare.cca, type=t)?

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Ranked abundance distribution

2013-12-17 Thread Jari Oksanen
Hello,
On 17/12/2013, at 17:01 PM, Sol Noetinger wrote:

 Hello,
 
 Cluster II
 RAD models, family poisson 
 No. of species 35, total abundance 100
 
par1 par2par3   Deviance AIC BIC
 Null   25.7004  Inf Inf
 Preemption0.1  27.8760  Inf Inf
 Lognormal   0.21756  1.3473 4.7797  Inf Inf
 Zipf0.27724 -1.0959 4.9038  Inf Inf
 Mandelbrot  0.64175 -1.3825  1  4.9181  Inf Inf
 
 I read from the manual that to see which models fits better you use the AIC 
 values. 
 What is the meaning of getting infinite?

It seems you can get infinite AIC and BIC when the distribution you selected is 
in conflict with the nature of your data. For instance, when you postulate a 
Poisson model (like in your case), but your data are not integers (counts). Was 
that the case with you? Distribution families that go along with non-integer 
(real) data are gaussian and Gamma. You can neither use quasipoisson nor other 
quasi models, because these do not have AIC. 

 Can I use the Deviance value to compare the models?
 And in case I can use the deviance, since there are very close values, should 
 I run a test to see if the differences are significant?� in that case, which  
 one?.

The models have different numbers of estimated parameters and they are not 
nested. Many people claim that you can use neither deviance nor AIC (and they 
are right). At least you must take into account the number of estimated 
parameters that varies from zero to three.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mean distance values (and errors) between groups of samples using vegan

2013-12-16 Thread Jari Oksanen
Function meandist() should do this. Its name is kind of hint.

Cheers, Jari Oksanen
On 16/12/2013, at 21:16 PM, Andres Mellado Diaz wrote:

 Dear list memnbers,
 
 I would greatly appreciate any suggestion on how to get averaged distance 
 values (and errors) between groups of samples using vegan (vegdist, dist,...).
 
 As an example, I have a faunal table (samples by taxa, mMCINVmmh) and some 
 grouping factors (for example: gradreg_mmh) that I want to test, like this:
 
 
 vegdist(mMCINVmmh, method=bray, binary=FALSE, diag=F, upper=F)-vegdist1
 
 vegdist1
 
  TR_1_2TR_1_3TR_2_2TR_2_3TR_3_1TR_3_2
 TR_1_3 0.7361704
 TR_2_2 0.1509007 0.7785157
 TR_2_3 0.6465259 0.4095644 0.6832353
 TR_3_1 0.8717468 0.9572968 0.8551546 0.9373980
 TR_3_2 0.6925163 0.7197110 0.7428020 0.4742819 0.7800620
 TR_3_3 0.8166542 0.7199418 0.8365397 0.6215122 0.8582932 0.5364242
 
 gradreg_mmh
 
 [1] 0 0 3 3 5 5 5
 
 Levels: 0 3 5
 
 
 
 So I want to get the pairwise mean distances and errors between groups 0, 3 
 and 5...
 
 
 many thanks in advance,
 
 Andrés
 
 -- 
 Andrés Mellado Díaz
 
 Centre for Hydrological Studies CEH-CEDEX
 Water Quality Department
 Pº bajo de la Virgen del Puerto, 3
 28005, Madrid
 SPAIN
 
 
 -- 
 Andrés Mellado Díaz
 
 Centre for Hydrological Studies CEH-CEDEX
 Water Quality Department
 Pº bajo de la Virgen del Puerto, 3
 28005, Madrid
 SPAIN
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NA error in envfit

2013-12-06 Thread Jari Oksanen

On 05/12/2013, at 18:42 PM, Dixon, Philip M [STAT] wrote:
 
 I wonder if the problem is a factor level with no observations.  One of the 
 frustrating things about factors (class variables) in R is that the list of 
 levels is stored separately from the data.  This can cause all sorts of 
 problems if you create the factor, then subset the data, and the subset is 
 missing one or more levels of the factor.  You are subsetting your data, so 
 this may be the source of the problem.
 
 My working philosophy is to keep variables as character strings or numbers 
 until just before I need the factors.  That avoids any issues with extraneous 
 levels.  That means reading data sets (.txt or .csv files) with as.is=TRUE to 
 avoid default creation of factors.  relevel() may recreate the list of 
 levels.  I usually use factor(as.character(variable)) to flip a factor to a 
 vector of character strings then back to a factor with the correct set of 
 levels.

Philip,

It very much look that this kind of approach is the source of all evil. We 
*assume* in envfit() that if a variable is not a factor, then it is numeric. If 
it is a character string instead being numeric, you get those strange error 
messages. We do take care of the extraneous factor levels in envfit, but we 
expected that variables are either factors or numeric -- we did not expect 
character strings. I guess we have to add some ugly code to handle these cases 
and either cast character strings to factors or ignore variables that are 
neither numeric nor factors.

Cheers, Jari Oksanen
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NA error in envfit

2013-12-06 Thread Jari Oksanen
Kendra,

Are you sure that it was a factor? I am unable to trigger an error with a 
one-level factor in vegan 2.0-9. Moreover, the error message you sent was from 
vectorfit and factors (also one-level factors) are not handled in that function 
but they go to factorfit, and error should come from factorfit.  Character 
strings  variable go to vectorfit (instead of factorfit) and gives exactly that 
error and from vectorfit. 

I don't ask these things out of my meanness, but I want to fix these functions 
for the next release. I have now found one problem and I have fixed that in 
R-Forge. If there are some other problems, I want to fix them, too. Therefore I 
really want to know what happened with your application. I try to reproduce 
your problems, but this is kind of blind watchmaker's works as I don't have a 
reproducible test case. Therefore I have to ask stubbornly.

Cheers, Jari Oksanen
On 06/12/2013, at 21:35 PM, Mitchell, Kendra wrote:

 My offending variable was correctly imported as a factor, but since I was 
 subsetting the data to look only at one zone at a time it was a factor with 
 only one level
 
 
 --
 Kendra Maas Mitchell, Ph.D.
 Post Doctoral Research Fellow
 University of British Columbia
 604-822-5646
 
 
 From: Gavin Simpson [ucfa...@gmail.com]
 Sent: Friday, December 06, 2013 11:09 AM
 To: Dixon, Philip M [STAT]
 Cc: Mitchell, Kendra; r-sig-ecology@r-project.org
 Subject: Re: [R-sig-eco] NA error in envfit
 
 Phillip,
 
 You approach to using factors misses an important consideration; the
 class that was observed in the full dataset should not disappear just
 because you subsetted the data in some manner. Also, `droplevels()` is
 a useful function to call on a factor or data frame if subsetting
 produces levels with zero observations and if that information is not
 made use on in whatever computations follow next.
 
 G
 
 On 5 December 2013 10:42, Dixon, Philip M [STAT] pdi...@iastate.edu wrote:
 Kendra,
 
 I wonder if the problem is a factor level with no observations.  One of the 
 frustrating things about factors (class variables) in R is that the list of 
 levels is stored separately from the data.  This can cause all sorts of 
 problems if you create the factor, then subset the data, and the subset is 
 missing one or more levels of the factor.  You are subsetting your data, so 
 this may be the source of the problem.
 
 My working philosophy is to keep variables as character strings or numbers 
 until just before I need the factors.  That avoids any issues with 
 extraneous levels.  That means reading data sets (.txt or .csv files) with 
 as.is=TRUE to avoid default creation of factors.  relevel() may recreate the 
 list of levels.  I usually use factor(as.character(variable)) to flip a 
 factor to a vector of character strings then back to a factor with the 
 correct set of levels.
 
 Best wishes,
 Philip Dixon
 
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 
 --
 Gavin Simpson, PhD
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NA error in envfit

2013-12-05 Thread Jari Oksanen
Hello,

I think I saw something like this in autumn (northern hemisphere) when a 
variable had constant values with no variation, and envfit did not know how to 
scale the arrow. 

We fixed this in the development version of vegan in R-Forge on September 29. 
Unfortunately R-Forge is again dysfunctional and cannot build the package, but 
if you are able to do that yourself you can try to see if the problem is fixed 
there. The same files are also in github, but you need to build the package 
yourself there too. I'm working with a minor release of vegan (2.0-10) which 
may be published on Monday 9 Dec, but there are no guarantees that it will have 
this envfit fix or be released like planned (you know, the best laid plans of 
mice and men...) 

It may be easiest to see if a constant variable is the culprit, and remove 
that if needed. If this is not the case, we need more info and a *reproducible* 
example. We haven't got any now, and I cannot reproduce your problem.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Stephen Sefick [sas0...@auburn.edu]
Sent: 04 December 2013 22:01
To: Mitchell, Kendra
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] NA error in envfit

Kendra,

Something is wrong in X or P; find out what the foreign function call is
  and then you may be able to track down the offending data problem.
Maybe a logarithm somewhere? This is probably not much help; I don't
have much experience with envfit.

Stephen

On 12/03/2013 07:06 PM, Mitchell, Kendra wrote:
 I'm running a bunch of NMS with vectors fitted (slicing and dicing a large 
 dataset in different ways).  I'm suddenly getting an error  from envfit

 f.bSBS.org.fit-envfit(f.bSBS.org.nms, f.bSBS.org.env, permutations=999, 
 na.rm=TRUE)

 Error in vectorfit(X, P, permutations, strata, choices, w = w, ...) :
NA/NaN/Inf in foreign function call (arg 1)
 In addition: Warning message:
 In vectorfit(X, P, permutations, strata, choices, w = w, ...) :
NAs introduced by coercion

 I can plot the NMS and even run ordifit on individual env variables, so can't 
 figure out what the problem is.   There aren't any NA/NaN/Inf in either of 
 those data that I can find.  I've tried running it without na.rm=TRUE and 
 still get the error.  Guidance on how to fix this would be appreciated.

 Here's the whole slicing process and str for the data


 f.bSBS.org-f.env$zone.hor==bSBS.1
 f.bSBS.org.tyc-f.tyc[f.bSBS.org,f.bSBS.org]
 f.bSBS.org.env-subset(f.env, f.env$zone.hor==bSBS.1)
 f.bSBS.org.nms-metaMDS(as.dist(f.bSBS.org.tyc), k=3, trymin=50, trymax=250, 
 wascores=FALSE)
 f.bSBS.org.fit-envfit(f.bSBS.org.nms, f.bSBS.org.env, permutations=999, 
 na.rm=TRUE)


 str(f.bSBS.org.env)
 'data.frame':63 obs. of  14 variables:
   $ zone : Factor w/ 6 levels bIDF,bSBS,..: 2 2 2 2 2 2 2 2 2 2 
 ...
   $ site : Factor w/ 18 levels A7,A8,A9,..: 12 12 12 12 12 12 
 12 12 12 12 ...
   $ om   : Factor w/ 4 levels 0,1,2,3: 2 2 2 3 3 3 2 2 2 3 ...
   $ compaction   : num  1 1 1 1 1 1 1 1 1 1 ...
   $ herbicide: num  0 0 0 0 0 0 0 0 0 0 ...
   $ horizon  : Factor w/ 2 levels 1,2: 1 1 1 1 1 1 1 1 1 1 ...
   $ Water_content: num  50.3 50.3 50.3 50.1 50.1 ...
   $ DNA_ug_g : num  71.2 71.2 71.2 68.6 68.6 ...
   $ C: num  30.5 30.5 30.5 28.4 28.4 ...
   $ N: num  0.863 0.863 0.863 0.81 0.81 ...
   $ pH_H2O   : num  4.63 4.63 4.63 4.49 4.49 ...
   $ CN   : num  35.3 35.3 35.3 35.1 35.1 ...
   $ f.env$zone   : Factor w/ 6 levels bIDF,bSBS,..: 2 2 2 2 2 2 2 2 2 2 
 ...
   $ zone.hor : chr  bSBS.1 bSBS.1 bSBS.1 bSBS.1 ...

 str(f.bSBS.org.nms)
 List of 35
   $ nobj  : int 63
   $ nfix  : int 0
   $ ndim  : num 3
   $ ndis  : int 1953
   $ ngrp  : int 1
   $ diss  : num [1:1953] 0.00424 0.00437 0.05169 0.07522 0.11039 ...
   $ iidx  : int [1:1953] 12 8 55 56 52 7 56 12 59 52 ...
   $ jidx  : int [1:1953] 7 6 18 55 8 3 18 3 12 49 ...
   $ xinit : num [1:189] 0.654 0.837 0.438 0.105 -0.313 ...
   $ istart: int 1
   $ isform: int 1
   $ ities : int 1
   $ iregn : int 1
   $ iscal : int 1
   $ maxits: int 200
   $ sratmx: num 1
   $ strmin: num 1e-04
   $ sfgrmn: num 1e-07
   $ dist  : num [1:1953] 0.0679 0.0231 0.3598 0.1248 0.1422 ...
   $ dhat  : num [1:1953] 0.0455 0.0455 0.2076 0.2076 0.2076 ...
   $ points: num [1:63, 1:3] -0.1256 0.1224 0.267 0.2374 -0.0427 ...
..- attr(*, dimnames)=List of 2
.. ..$ : chr [1:63] LL001 LL002 LL003 LL007 ...
.. ..$ : chr [1:3] MDS1 MDS2 MDS3
..- attr(*, centre)= logi TRUE
..- attr(*, pc)= logi TRUE
..- attr(*, halfchange)= logi FALSE
   $ stress: num 0.157
   $ grstress  : num 0.157
   $ iters : int 180
   $ icause: int 3
   $ call  : language metaMDS(comm = as.dist(f.bSBS.org.tyc), k = 3, 
 trymax = 250, wascores = FALSE

Re: [R-sig-eco] NA error in envfit

2013-12-05 Thread Jari Oksanen
It is easy if you have C and Fortran compilers plus unix tools. I assume most 
people do not have those. Then 'easy' is quite different a concept.

Cheers, Jari Oksanen
 alkuperäinen viesti 
Lähettäjä: Hadley Wickham
Lähetetty:  05.12.2013, 16:19
Vastaanottaja: Eduard Szöcs
Kopio: r-sig-ecology@r-project.org
Aihe: Re: [R-sig-eco] NA error in envfit

 # install vegan from github
 install_github('vegan', 'jarioksa')

BTW I recommend using this form:

install_github('jarioksa/vegan')

Hadley

--
http://had.co.nz/

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Community composition variance partitioning?

2013-12-04 Thread Jari Oksanen
Hi,

Not only odd, but impossible. If you have a model y ~ x1, and you *add* a new 
explanatory variable, you cannot get worse in raw R2. You can get worse in 
adjusted R2. You can also get worse if you add variables to a matrix for which 
you calculate distances. So dist(y) ~ dist([x1]) can have higher R2 than 
dist(y) ~ dist([x1,x2]) -- bioenv is based on this.

Cheers, Jari Oksanen

Sent from my iPad

 On 4.12.2013, at 20.19, Sarah Goslee sarah.gos...@gmail.com wrote:
 
 Hi,
 
 That seems a bit odd: can you provide a reproducible example, off-list
 if necessary?
 
 Sarah
 
 
 
 On Wed, Dec 4, 2013 at 12:50 PM, Alexandre Fadigas de Souza
 alexso...@cb.ufrn.br wrote:
 Dear friends,
 
   My name is Alexandre and I am trying to analyze a dataset on floristic 
 composition of tropical coastal vegetation by means of variance partition, 
 according to the outlines of a Tuomisto's recent papers, specially
 
 Tuomisto, H., Ruokolainen, L., Ruokolainen, K., 2012. Modelling niche and 
 neutral dynamics : on the ecological interpretation of variation 
 partitioning results. Ecography (Cop.). 35, 961–971.
 
   I have a doubt, could you please give your opinion on it?
 
   We are proceeding a variance partition of the bray-curtis floristic 
 distance using as explanatory fractions soil nutrition, topography, canopy 
 openess and geographical distances (all as euclidean distance matrices).
 
 We are using the MRM function of the ecodist package:
 
 mrm - MRM(dist(species) ~ dist(soil) + dist(topograph) + dist(light) + 
 dist(xy), data=my.data, nperm=1
 
 The idea is that the overall R2 of this multiple regression should be used 
 to assess the contributions of the spatial and environmental fractions 
 through subtraction :
 
 Three separate multiple regression analyses are needed
 to assess the relative explanatory power of geographical
 and environmental distances. All of these have the same
 response variable (the compositional dissimilarity matrix),
 but each analysis uses a diff erent set of the explanatory
 variables. In these analyses the explanatory variables are:
 (I) the geographical distance matrix only, (II) the environmental
 diff erence matrices only, and (III) all the explanatory
 variables used in (I) or (II). Comparing the R 2 values
 from these three analyses allows partitioning the variance
 of the response dissimilarity matrix to four fractions.
 Fraction A is explained uniquely by the environmental
 diff erence matrices and equals R2 (III) R2 (I). Fraction B
 is explained jointly by the environmental and geographical
 distances and equals R2 (I) R2 (II) R2 (III). Fraction C
 is explained uniquely by geographical distances and
 equals R2 (III) R2 (II). Fraction D is unexplained by the
 available environmental and geographical dissimilarity
 matrices and equals 100% R2 (III) (throughout the present
 paper, R2 values are expressed as percentages rather
 than proportions). [Tuomisto et al. 2012]
 
 The problem is that the R2 of the overall model (containing all the 
 explanatory variables) is smaller than most of the R2 of models containing 
 each of the explanatory matrices. So it seems not possible to proceed with 
 the approach proposed.
 
 
Sincerely,
 
Alexandre
 
 Dr. Alexandre F. Souza
 Professor Adjunto II Departamento de Botanica, Ecologia e Zoologia  
 Universidade Federal do Rio Grande do Norte (UFRN)  
 http://www.docente.ufrn.br/alexsouza  Curriculo: 
 lattes.cnpq.br/7844758818522706
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 
 -- 
 Sarah Goslee
 http://www.stringpage.com
 http://www.sarahgoslee.com
 http://www.functionaldiversity.org
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NA in species scores using capscale in vegan

2013-11-07 Thread Jari Oksanen

On 07/11/2013, at 12:05 PM, Duarte Viana wrote:

 Hello all,
 
 I am doing a CAP (or dbRDA) in vegan, using the function capscale, where
 the response is a dissimilarity matrix (Sorensen's distance index) and the
 explanatory matrix consists of 19 variables. The code runs apparently fine,
 but the species scores appear as NA (site scores are properly given).
 
 Does anyone knows what might be the problem?
 
Duarte,

Did you supply species data? Species scores cannot be estimated without 
information on species. If your response is a dissimilarity, there is no 
information about the original species. You can supply the original community 
data (which has the information on species) as argument 'comm' in capscale().

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate Analyses of Ecological Communities

2013-11-04 Thread Jari Oksanen

On 04/11/2013, at 17:59 PM, Rich Shepard wrote:

  I've read the docs for anosim, adonis, and mrpp and am currently reading
 the vegan tutorial. Toward the end of the anosim and mprr docs the author
 writes that each has potential shortcomings and recommends the adonis
 package be used instead. This leads to my two questions:
 
  1) Why are anosin and mrpp packages still maintained and included? That
 they are suggests question

To serve the people?

 
  2) Under what circumstances are these two packages appropriately used?
 
I don't know, but since they are available, it is possible to find out.

I have suggested in the documentation that adonis() usually is better, but it 
also suffers from similar problems as these alternatives. My analysis suggests 
that adonis() is more robust and reliable. However, not everybody agrees, and 
alternatives are provided for these standard methods. Since the alternatives 
are provided, you can study the issue independently.

The existence of a method is not a sufficient reason to use that method.  

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cross validate model with calibrate

2013-11-02 Thread Jari Oksanen
Dear Philipp,

I cannot see anything wrong in your approach. The only problem is that RDA may 
not do very well. It is very common that the calibrated values are outside the 
original range of the data and if the predictions are not very good, they can 
also be unrealistic values. RDA has no idea what is realistic for given data. 

You get cross-validated values outside data range even in the Dune meadow data 
you used. Here an example that applies 4-fold cross validation (in line with 
3/4 subsetting in your example):

library(vegan)
data(dune, dune.env)
vec - rep(1:4, length=nrow(dune))
pred - rep(NA, length=nrow(dune))
set.seed(4711)
take - sample(vec)
for(i in 1:4) pred[take==i] - calibrate(rda(dune ~ A1, dune.env, subset = take 
!= i), newdata = subset(dune, take==i))
range(pred)  
## [1]  0.9104486 12.3583826
with(dune.env, range(A1))
## [1]  2.8 11.5

The predicted values extend nearly 2cm below and 1cm above the data values. In 
this case the predicted values were not negative, but that could happen, too. 
The tool (RDA) may not work very well in all cases, but RMSE will tell you how 
good RDA is for the job. If it is bad, do not use it. It seems RDA is very poor 
in this dune example: RMSE is larger than standard deviation of A1 (but you 
could almost guess this by looking at the low constrained eigenvalues of the 
model).

The code above uses the 'subset' argument of vegan::rda and the subset() 
function of R to make the code a bit easier to write.

Cheers, Jari Oksanen


On 01/11/2013, at 15:53 PM, Philipp01 wrote:

 Hello all,
 
 I would like to cross validate my rda() derived model with the calibrate
 function (vegan package) and calculate the RMSE as value for performance
 measure. 
 For simplicity I use the example from the predict.cca {vegan} help.
 
 library(ade4)
 library(vegan)
 
 data(dune)
 data(dune.env)
 
 nbr - as.numeric(rownames(dune))
 
 library(caret)
 inTrain - createDataPartition(y=nbr, p=3/4, list=F, times=1)
 
 train.dune - dune[inTrain[,i],];
 test.dune  - dune[-inTrain[,i],];
 
 train.dune.env - dune.env[inTrain[,i],];
 test.dune.env  - dune.env[-inTrain[,i],];
 
 
 mod  - rda(train.dune ~ A1, train.dune.env)
 cal - calibrate(mod, newdata=test.dune)
 
 with(test.dune.env, plot(A1, cal[,A1] - A1, ylab=Prediction Error))
 abline(h=0)
 
 error - cal - test.dune.env$A1
 (rmse - sqrt(mean(error^2)))
 
 When I apply this code snippet to my very own data I get positive and
 negative cal values, which would be unrealistic for parameters such as
 tree height (etc.). Therefore, I doubt that my approach is correct. How do
 you compute the RMSE for the rda() derived model?
 
 Regards, Philipp 
 
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/Cross-validate-model-with-calibrate-tp7578486.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] angular statistics

2013-10-18 Thread Jari Oksanen
If you use these, remember that R cos() needs argument in radians.

Cheers, Jari Oksanen

Sent from my iPad

 On 18.10.2013, at 8.40, Ivailo ubuntero.9...@gmail.com wrote:
 
 On Fri, Oct 18, 2013 at 6:16 AM, Michael Marsh sw...@blarg.net wrote:
 
 If you want a measure of exposure, i. e., heat, I suggest using the 
 heatload transformation suggested by McCune and Grace (2002). Their 
 assumption is that mid-afternoon, when the sun is in the southwest, is 
 usually the warmest time of day. The formula at the end of Chapter 3 follows:
 
 heat load index=(1-cos(degrees-45))/2
 
 McCune, Bruce and James B. Grace. 2002. Analysis of ecological communities. 
 MJM Software Design. Gleneden Beach, Oregon. USA
 
 Thanks for the interesting discussion!
 
 I'd like to add that although I don't have the book, I found the
 radiation measures presented in the following paper:
 McCune, B. and D. Keon. 2002. Equations for potential annual direct
 incident radiation and heat load. Journal of Vegetation Science
 13:603–606.
 
 Cheers,
 Ivailo
 -- 
 UBUNTU: a person is a person through other persons.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] vegan RsquareAdj() for lm models

2013-10-03 Thread Jari Oksanen
Specific reason is that nobody has implemented this. These things don't come by 
automatic writing, but somebody must do them.

What would you expect to get? Is this what was on your mind:

 sapply(summary(lm(yy~xx-1)), function(x) c(r.squared = x$r.squared, 
 adj.r.squared = x$adj.r.squared))
  Response Y1 Response Y2 Response Y3 Response Y4
r.squared  0.06845032  0.04788037  0.01702738  0.11253059
adj.r.squared -0.01255400 -0.03491264 -0.06844849  0.03535934


This could be implemented, but (1) is this what you or anybody else would like 
to have?, (2) how many things would it break by returning several values 
instead of one?

If you want to have this, you really do not need to use vegan. 
vegan:::RsquareAdj.lm() takes its results from summary(lm_object). You can 
use that stats:::summary.lm directly.

Cheers, Jari Oksanen


From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Paolo Piras [paolo.pi...@uniroma3.it]
Sent: 03 October 2013 14:59
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] vegan RsquareAdj() for lm models

Dear list,
I would like to easily compute the adjusted R-square in a lm model without 
intercept (excluding the intercept is essential for my analysis)

I found that RsquareAdj() in vegan returns NA if the argument is  a 
multiple-multivariate lm model thus including multivariate responses and 
multiple predictors, while it works for univariate response and multiple 
predictors.

For example:
library(vegan)

yy-matrix(rnorm(200,0,1),ncol=4)
xx-matrix(rnorm(200,0,1),ncol=4)
RsquareAdj(lm(yy~xx-1))
RsquareAdj(lm(yy[,1]~xx-1))



There some specific reason for this behavior?
Thanks in advance for any advice
best regards
Poalo






___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] vegan RsquareAdj() for lm models

2013-10-03 Thread Jari Oksanen
Paolo,

See ?RsquareAdj for the call interface. The default method can be called as 
RsquareAdj(x, n, m), and in the default method x is the unadjusted correlation, 
n is the number of observations and m is the number of parameters (degrees of 
freedom) in the fitted model. Specific methods for univariate lm or for rda 
(and some others) find these variables in the result object, but then they just 
call the default method with the found x, n and m. You can build your model on 
that. It is possible to build a specific function for mlm objects, but nobody 
has done so in vegan.

You cannot build an rda design without an intercept. It was a conscious design 
decision to make this impossible without hacking the rda.default code (I even 
say this in decision-vegan vignette). I am not going to make easy to have 
non-centred RDA: I care too much about people and i don't want to do evil. If 
you really *know* that you need non-centred RDA. then you know how to change 
those lines of code in rda.default. 

Cheers, Jari Oksanen

From: Paolo Piras [paolo.pi...@uniroma3.it]
Sent: 03 October 2013 15:52
To: Jari Oksanen; r-sig-ecology@r-project.org
Subject: RE: [R-sig-eco] vegan RsquareAdj() for lm models

Thankyou very much Jari!
I think that it is nearly ok
what I would like to have is the same as in
RsquareAdj(vegan::rda(yy,xx))

that is a GLOBAL measure of the association
BUT...I want it for a multiple-multivariate lm model that does not include the 
intercept;
an alternative could be to build a rda design for the exclusion of intercept 
but I really cannot figure out how to do it.

I think I just need to compute the average of single adjusted r squared from 
the output of your line of code,
But the results are not identical
EXAMPLE  WITH INTERCEPT IN ORDER TO COMPARE WITH RDA

RsquareAdj(vegan::rda(yy,xx))
mean(sapply(summary(lm(yy~xx)), function(x) c(r.squared = x$r.squared, 
adj.r.squared = x$adj.r.squared))[2,])

Or I just miss something in this computation

Thanks again for any further suggestion



Da: Jari Oksanen jari.oksa...@oulu.fi
Inviato: giovedì 3 ottobre 2013 14.25
A: Paolo Piras; r-sig-ecology@r-project.org
Oggetto: RE: [R-sig-eco] vegan RsquareAdj() for lm models

Specific reason is that nobody has implemented this. These things don't come by 
automatic writing, but somebody must do them.

What would you expect to get? Is this what was on your mind:

 sapply(summary(lm(yy~xx-1)), function(x) c(r.squared = x$r.squared, 
 adj.r.squared = x$adj.r.squared))
  Response Y1 Response Y2 Response Y3 Response Y4
r.squared  0.06845032  0.04788037  0.01702738  0.11253059
adj.r.squared -0.01255400 -0.03491264 -0.06844849  0.03535934


This could be implemented, but (1) is this what you or anybody else would like 
to have?, (2) how many things would it break by returning several values 
instead of one?

If you want to have this, you really do not need to use vegan. 
vegan:::RsquareAdj.lm() takes its results from summary(lm_object). You can 
use that stats:::summary.lm directly.

Cheers, Jari Oksanen


From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Paolo Piras [paolo.pi...@uniroma3.it]
Sent: 03 October 2013 14:59
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] vegan RsquareAdj() for lm models

Dear list,
I would like to easily compute the adjusted R-square in a lm model without 
intercept (excluding the intercept is essential for my analysis)

I found that RsquareAdj() in vegan returns NA if the argument is  a 
multiple-multivariate lm model thus including multivariate responses and 
multiple predictors, while it works for univariate response and multiple 
predictors.

For example:
library(vegan)

yy-matrix(rnorm(200,0,1),ncol=4)
xx-matrix(rnorm(200,0,1),ncol=4)
RsquareAdj(lm(yy~xx-1))
RsquareAdj(lm(yy[,1]~xx-1))



There some specific reason for this behavior?
Thanks in advance for any advice
best regards
Poalo






___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] null model for testing nestedness

2013-09-25 Thread Jari Oksanen
Valerie,

There are at least two problems here: the way you call oecosimu() and how 
nestdnodf(..., weighted =TRUE) works with binary data. 

If you specify a *binary* null model as method, then you will get binary data. 
Even if you supplied quantitative data, they are transformed into 1/0 
(presence/absence) data. You specified method = quasiswap, and that is binary 
model. Another problem is that nestednodf(..., weighted = TRUE) seems to 
evaluate the statistics all as zeros if you request weighted (= quantitative 
data) analysis of non-quantitative data (binary). It cannot perform weighted 
analysis if there are no weights, but still I think it should return something 
else than zeros. We'll have a look at that issue. 

You should specify a non-binary null model if you want to have a non-binary 
(weighted) analysis. Quantitative null models are problematic, and vegan 
release version does not have much choice here. I think r2dtable may be the 
only one. Development version of vegan in http://www.r-forge.r-project.org/ has 
a wider gamme of non-binary null models, but I think you need to be brave to 
use quantitative null models. They are something for people who are not afraid 
of going to areas where angels fear to tread.

FWIW, weighted nestednodf seems to work in oecosimu if you ask for a 
quantitative nullmodel (r2dtable in my tests) both with the release version 
(2.0-8 or 2.0-9) and with the development version (2.1-35 or 2.1-36). But you 
really need to to specify a quantitative null model. Both null models and 
oecosimu are completely re-written and re-designed in development versions.

Cheers, Jari Oksanen

On 25/09/2013, at 15:56 PM, v_coudr...@voila.fr
 wrote:

 Thank you very much. Yes it is working with oecosimu, exept that it does not 
 seem to work for weighted data. There is the possibility to specify weighted 
 = TRUE: 
 
 oecosimu(matrix,nestednodf, method = quasiswap, nsimul = 999, order = 
 FALSE, weighted =TRUE)
 
 However, I get only null values and p=1. For weighted = F, I get good values.
 
 Best wishes
 ___
 Quiz TV : Vous êtes fan de la série Friends ? 5 questions ici 
 http://tv.voila.fr/quiz/quiz-special-friends_14538959.html
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] subsetting lower triangle distance matrix based on variable in another object

2013-07-30 Thread Jari Oksanen
Kendra,

Subset a matrix an convert to dist. You must subset both rows and columns in 
the same way, and therefore you must use subsetting index instead of a subset() 
function. Let k be a vector that is TRUE for selected items; 
as.dist(yoursymmetricmatrix[k, k]) is what you are looking for.

Cheers, Jari Oksanen
 alkuperäinen viesti 
Lähettäjä: Mitchell, Kendra
Lähetetty:  31.07.2013, 02:38
Vastaanottaja: r-sig-ecology@r-project.org r-sig-ecology@r-project.org
Aihe: [R-sig-eco] subsetting lower triangle distance matrix based on variable 
in another object

I have a number of dissimilarity matrices built in another program that I'd 
like to subset then run NMS.

#read in distance matrix
b_bc03 -data.matrix(read.table(bac_final.an.thetayc.0.03.lt.ave.dist, 
row.names=1, header=T, fill=TRUE))  # creates a 736x736 double matrix

#read in environmental variables
 bac_env-read.csv(bac_samples.csv, row.names=1, header=T)  # 736 obs. of 6 
variables

#The only way that I've figured out how to subset a lt matrix is to convert to 
dist object first
bc_dist-as.dist(b_bc03)  # dist[27048]
tx_dist-subset(bc_dist, bac_env$zone==TX) # numeric[54316]

Here is where it falls apart, the subset doesn't keep the labels, just the 
values so I can't convert it back to a distance matrix.  I've looked through 
ecodist but can't find anyway in that package to select only certain samples.

Help please?

thanks
Kendra



--
Kendra Maas Mitchell, Ph.D.
Post Doctoral Research Fellow
University of British Columbia
604-822-5646

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R

2013-07-24 Thread Jari Oksanen
Vegemite is vegemite because it is so condensed. It is condensed because it 
uses only one column for data and no separators between columns. If you want to 
print your full data, just do so. You do not need any fancy functions for the 
purpose. If you want to reorder your data, learn how to use row and column 
indices. For vegemite you need one-character scaling, and vegemite provides 
some popular alternatives. You are not limited those, nor to numeric 
characters, but you can roll your own.

If you do not provide one-character data, vegemite gives an error message that 
is intended to be informative: it says what was the problem, and suggests what 
to do (use scale). See ?vegemite.

Vegan provides an interface to standard graphical heatmap. This sister of 
vegemite is called tabasco (because it is also condensed but hotter than 
vegemite). It takes any non-negative data, but you may need scaling to see 
shades of colours. Names can also be unreadable in large tables. Fixes are 
welcome. I think ade4 may have something similar.

Cheers, Jari Oksanen
 alkuperäinen viesti 
Lähettäjä: Basil Iannone
Lähetetty:  24.07.2013, 14:57
Vastaanottaja: Sarah Goslee
Kopio: r-sig-ecology@r-project.org
Aihe: Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R

Valarie,

I too ran across this problem. vegmite operates on data with a single digit
such as Braun-Blanquet cover estimates. Thus vegemite command will not work
on data such as environmental data or biomass data unless you convert it to
a single digit, causing you to loose a lot of information. So in short,
vegmite is a method used to compile and arrange single digit cover
estimates into a table. If you find a similar function that works on other
kinds of data, please let me and the other R users know.

And someone please correct me if my assessment of the situation is
incorrect.


On Wed, Jul 24, 2013 at 6:14 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi Valerie,

 Did the suggestions given on the R-help list fail to work?

 You do need to provide you abundance data as one character, as specified in
 ?vegemite which also gives suggestions for conversion.

 Sarah

 On Tuesday, July 23, 2013, Valerie Mucciarelli wrote:

  Hello,
 
  I am relatively new to R and I am working through the code that is
  provided in the book Numerical Ecology with R:
  http://xa.yimg.com/kq/groups/19243105/1919134110/name/Numerical.pdf (pg
  79)
 
   and I have run across an error message that I can't seem to figure out.
 
   I am using the vegan, ade4, gclus and cluster packages. The code is as
  follows:
 
   # Ordered community table
   # Species are ordered by their weighted averages on site scores
 
   or - vegemite(spe, spe.chwo)
 
   spe is the dataframe, here is part of it:
 
   AGA   ANT  BON   CAL1   CAL   CER   CRY   DES  EUTH FRY
   1  0.420 0.092 0.051 0.000 0.975 0.000 0.111 0.000 0.127 0
   2  0.000 0.000 0.007 0.002 0.915 0.000 0.000 0.000 0.151 0
   4  0.000 0.008 0.000 0.009 0.124 0.003 0.000 0.000 0.095 0
   7  0.000 0.002 0.003 0.002 0.121 0.002 0.000 3.573 0.180 0
   12 0.000 0.020 0.000 0.002 0.444 0.001 0.000 0.000 0.242 0
   13 8.727 0.000 0.000 0.000 0.743 0.000 0.000 0.000 0.050 0
   14 2.163 0.009 0.000 0.003 1.121 0.000 0.000 0.000 0.051 0
   15 0.000 0.004 0.000 0.000 0.109 0.000 0.000 0.000 0.007 0
   18 9.021 0.018 0.002 0.000 0.286 0.000 0.000 0.000 0.028 0
   19 0.000 0.038 0.000 0.019 0.509 0.000 0.000 0.000 0.155 0
 
   spe.chwo came from:
 
   spe.norm - decostand(spe, normalize)
   spe.ch - vegdist(spe.norm, euc)
   spe.ch.UPGMA - hclust(spe.ch, method = average)
   spe.chwo - reorder.hclust(spe.ch.UPGMA, spe.ch)
 
 
   and the error is:
 
   Error in vegemite(spe, spe.chwo) :
   Cowardly refusing to use longer than 1 char symbols:
   Use scale
 
   The data in the dataframe is biomass data recorded to 4 digits. I'm
  wondering if this code is not working because my data is more than one
  digit long.
 
   Any suggestions or insight on how to get this code to work would be
  greatly appreciated.
 
   Thank you,
 
   Val
 
 

 --
 Sarah Goslee
 http://www.stringpage.com
 http://www.sarahgoslee.com
 http://www.functionaldiversity.org

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




--
Basil V. Iannone III, Ph.D.
University of Illinois at Chicago
Department of Biological Sciences (MC 066)
845 W. Taylor St.
Chicago, IL  60607-7060
Email: bian...@uic.edu
Phone: 312-355-0987
Fax: 312-413-2435
http://www2.uic.edu/~bianno2

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman

Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R

2013-07-24 Thread Jari Oksanen
In some cases lousy email system mandates promoting piracy: i replys, the 
previous messages are treated as appendices that cannot be edited. This also 
means that you must top-post. Kudos goes to MicroSoft + Nokia.

Cheers, Jari Oksanen
 alkuperäinen viesti 
Lähettäjä: THIOULOUSE JEAN
Lähetetty:  24.07.2013, 16:59
Vastaanottaja: François Gillet
Kopio: ethics.report...@springer.com ethics.report...@springer.com; 
r-sig-ecology@r-project.org; Daniel Borcard
Aihe: Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R

Hi François,

Note that Jari and yourself also encourage piracy in the same way ;)

Jean


Le 24 juil. 2013 à 15:16, François Gillet francois.gil...@univ-fcomte.fr
 a écrit :

 Valerie and Sarah,

 Thank you for encouraging piracy by posting the URL of an illegal PDF of
 our book :-(

 By the way, just look at ?vegemite and you'll find easily the solution to
 your problem: with the argument scale=log your numeric data will be
 converted to 1-char symbols.

 François

 ---
 Prof. *François Gillet*
 Université de Franche-Comté - CNRS
 UMR 6249 Chrono-environnement
 UFR Sciences et Techniques
 16, Route de Gray
 F-25030 Besançon cedex
 France
 http://chrono-environnement.univ-fcomte.fr/
 http://chrono-environnement.univ-fcomte.fr/spip.php?article530
 Phone: +33 (0)3 81 66 62 81
 iPhone: +33 (0)7 88 37 07 76
 Location: La Bouloie, Bât. Propédeutique, *-114L*
 ---
 Editor of* Plant Ecology and Evolution*
 http://www.plecevo.eu
 ---
 *
 ***


 2013/7/24 Sarah Goslee sarah.gos...@gmail.com

 Hi Valerie,

 Did the suggestions given on the R-help list fail to work?

 You do need to provide you abundance data as one character, as specified in
 ?vegemite which also gives suggestions for conversion.

 Sarah

 On Tuesday, July 23, 2013, Valerie Mucciarelli wrote:

 Hello,

 I am relatively new to R and I am working through the code that is
 provided in the book Numerical Ecology with R:
 (pg 79)

 and I have run across an error message that I can't seem to figure out.

 I am using the vegan, ade4, gclus and cluster packages. The code is as
 follows:

 # Ordered community table
 # Species are ordered by their weighted averages on site scores

 or - vegemite(spe, spe.chwo)

 spe is the dataframe, here is part of it:

 AGA   ANT  BON   CAL1   CAL   CER   CRY   DES  EUTH FRY
 1  0.420 0.092 0.051 0.000 0.975 0.000 0.111 0.000 0.127 0
 2  0.000 0.000 0.007 0.002 0.915 0.000 0.000 0.000 0.151 0
 4  0.000 0.008 0.000 0.009 0.124 0.003 0.000 0.000 0.095 0
 7  0.000 0.002 0.003 0.002 0.121 0.002 0.000 3.573 0.180 0
 12 0.000 0.020 0.000 0.002 0.444 0.001 0.000 0.000 0.242 0
 13 8.727 0.000 0.000 0.000 0.743 0.000 0.000 0.000 0.050 0
 14 2.163 0.009 0.000 0.003 1.121 0.000 0.000 0.000 0.051 0
 15 0.000 0.004 0.000 0.000 0.109 0.000 0.000 0.000 0.007 0
 18 9.021 0.018 0.002 0.000 0.286 0.000 0.000 0.000 0.028 0
 19 0.000 0.038 0.000 0.019 0.509 0.000 0.000 0.000 0.155 0

 spe.chwo came from:

 spe.norm - decostand(spe, normalize)
 spe.ch - vegdist(spe.norm, euc)
 spe.ch.UPGMA - hclust(spe.ch, method = average)
 spe.chwo - reorder.hclust(spe.ch.UPGMA, spe.ch)


 and the error is:

 Error in vegemite(spe, spe.chwo) :
 Cowardly refusing to use longer than 1 char symbols:
 Use scale

 The data in the dataframe is biomass data recorded to 4 digits. I'm
 wondering if this code is not working because my data is more than one
 digit long.

 Any suggestions or insight on how to get this code to work would be
 greatly appreciated.

 Thank you,

 Val



 --
 Sarah Goslee
 http://www.stringpage.com
 http://www.sarahgoslee.com
 http://www.functionaldiversity.org

[[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


   [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] error in MetaMDS

2013-07-21 Thread Jari Oksanen

On 21/07/2013, at 03:02 AM, Elaine Kuo wrote:

 Hello
 
 I am trying to run MetaMDS but got the following error based on the code.
 Please kindly indicate the error source and thank you.
 
 Elaine
 
 Code
 # Compute the Simpson distance matrices for the dataset (3 hr using 4GB)
  library(betapart)
  dist.sim-beta.pair(dataR, index.family=sor)
  class(dist.sim)
 
  betasim.sim-dist.sim[[beta.sim]]
  class(betasim.sim)
 
 # NMDS
  library(vegan)
  nms - metaMDS(betasim.sim,trymax=100,zerodist=add,autotransform
 =FALSE,k=2)
 
 Error
 Error in if (any(autotransform, noshare  0, wascores)  any(comm  0)) {
 :
  missing value where TRUE/FALSE needed

Elaine,

I cannot reproduce this, and there is no information in the message to know 
what happens.

Do you have missing values in betasim.sim? Are betasim.sim really 
dissimilarities?

BTW, vegan has Simpson dissimilarity: use betadiver(x, sim).

Cheers, Jari Oksanen
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] forward selection RDA after controlling for constraints

2013-07-10 Thread Jari Oksanen

On 10/07/2013, at 21:00 PM, Stephen Sefick wrote:

 Hello all,
 
 I would like to run this by everyone and maybe get some hints as to what R 
 functions I could use for this.  Ok, so I have macroinvertebrate assemblage 
 data from across the SE.  I would like to control for geographic distance 
 (lat/long), Watershed area, and year before submitting these data to an RDA 
 with the rest of the environmental data using a variable selection technique.
 
 Does it make sense to detrend the data using a mlm on hellinger transfomed 
 abundances with the above env variables as regressors and then submit the 
 residuals to rda with the rest of the env variables I am interested in?


Stephen,

If you happen to use vegan functions for forward selection, please note that 
they all (should) take a scope argument that can (should) be a list of lower 
and upper scopes. Put your controlled variables (distance???, watershed area, 
year) in the lower scope and these plus other candidate variables in the upper 
scope, and there you go. I have used should, because I have rarely used these 
functions myself, and I'm not sure if lower scope really is implemented in all, 
but is *should* be: file a bug report if this fails. 

I have no idea how to have distance RDA. Well, I have ideas, but none that I 
have are very good.

Using separate mlm and modelling residuals will not work quite correctly, 
because that ignores correlations between groups of variables. Vegan functions 
do not ignore those.

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] anova.cca question / missing data in constraining matrix

2013-06-04 Thread Jari Oksanen
Dear Colleen, 
On 03/06/2013, at 22:32 PM, ckellogg wrote:

 Hello Jari,
 Thank you for your help with this.  The solution you suggested in your
 second post worked quite well.  However, i think another subset of my data
 is too 'holey', because when I run CCA on this set of environmental
 variables (or the a CCA with the previous environmental variables and the
 additional ones), I get an error: 
 
 toolik250.cca2
 -cca(toolikotus250.ra~logtemp+conductivity+pH+logBacProd+DIC+logDCO2+sqrtDCH4+logDOC+sqrtPhosphate+sqrtNitrate+sqrtTDN+sqrtTDP+logPC+logPN+Ca+Mg+logNa+logK+SO4+logChloride+Silica,toolikenv.s,
 na.action=na.exclude)
 Error in predict.cca(x, newdata = excluded, type = wa, model = CA) : 
  model “CA” has rank 0
 
 The CCA runs if I use na.action=na.omit, but then when I run the anovas,
 there is apparently no residual component.  For example,
 No residual component
 
 Model: cca(formula = toolikotus250.ra ~ logtemp + conductivity + pH +
 logBacProd + DIC + logDCO2 + sqrtDCH4 + logDOC + sqrtPhosphate + sqrtNitrate
 + sqrtTDN + sqrtTDP + logPC + logPN + Ca + Mg + logNa + logK + SO4 +
 logChloride + Silica, data = toolikenv.s, na.action = na.omit, subset =
 -toolik250.cca2$na.action)
 Df  Chisq F N.Perm Pr(F)
 Model12 5.30030   
 Residual  0 0.   
 
Yes, probably too many holes. You have no residual variation which indicates 
that the number
of predictor variables (constraints) is higher than the number of remaining 
observations. 

Cheers, Jari Oksanen

 So, I am thinking that examining the relationship between the microbial
 community and this subset of environmental variables might not be possible
 without my first manually curating which samples and variables should be
 included, correct?
 
 Thank you,
 Colleen
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/anova-cca-question-missing-data-in-constraining-matrix-tp7578175p7578179.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] anova.cca question / missing data in constraining matrix

2013-06-01 Thread Jari Oksanen

On 01/06/2013, at 05:20 AM, Jari Oksanen wrote:
 
 The CCA seems to run just fine, but when I attempt to do the posthoc tests
 such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error
 message: Error in anova.ccabyterm(object, step = step, ...) : number of
 rows has changed: remove missing values?  What exactly is occurring here to
 cause this error - I suspect it must be related to the fact that the
 environmental data matrix has a lot of missing data.  I don't quite
 understand why it states that the number of rows has changed...changed from
 what?  
 
 The number of rows has changed from term to term. That is, you have different 
 numbers of missing values in each term (= explanatory variable), and when 
 rows with missing values are removed for the current model, the accepted 
 observations change from term to term. I admit the error message is not the 
 most obvious one. I must see where it comes from, and how to make it more 
 informative. However, it does give a hint to remove missing values, doesn't 
 it?
 
 If you want to have a term-wise test with missing values in terms, you must 
 refit your model for complete.cases.  Use argument 'subset' to select a 
 subset with no missing values. Currently I don't know any nice short cut to 
 do this with the current mode, but the following may work (untested), 
 although it is not nice:
 
 keep - rep(TRUE, nrow(tooliken.s)
 keep[toolik250.cca$na.action] - FALSE
 m2 - update(toolik250.cca, subset = keep)
 anova(m2, by=terms, perm=999)

Actually, there is a bit easier way of doing this, because 'subset' can also be 
a vector of indices, and negative indices acn be used to remove observations. 
If 'toolik250.cca' is a result object with missing observations, then

m2 - update(toolik250.cca, subset = -toolik250.cca$na.action)

will remove items listed as removed in 'na.action' (NB. the minus sign in 
'subset'). The update()d model will be equal to the original model, but missing 
data removed.

Cheers, Jari Oksanen 

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] anova.cca question / missing data in constraining matrix

2013-05-31 Thread Jari Oksanen
Hello, 
On 01/06/2013, at 02:10 AM, ckellogg wrote:

 Hello,
 I am using the cca function in Vegan to examine the relationship between
 microbial community structure and a (large) suite of environmental
 variables.  My constraining/environmental data matrix as a lot of holes in
 it so I have been exploring using the na.action argument. 
 This is the command I am currently using:
 toolik250.cca-cca(toolikotus250.ra~julianday+logwindspd_max_1dayprior+lograin_3dayprior+sqrtwindspd_1dayprior+windspd_3dayprior+days_since_thaw+days_since_iceout+days_btw_iceoutandthaw+toolikepitemp_24h+logtemp+conductivity+pH,
 toolikenv.s, na.action=na.omit)
 
 The CCA seems to run just fine, but when I attempt to do the posthoc tests
 such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error
 message: Error in anova.ccabyterm(object, step = step, ...) : number of
 rows has changed: remove missing values?  What exactly is occurring here to
 cause this error - I suspect it must be related to the fact that the
 environmental data matrix has a lot of missing data.  I don't quite
 understand why it states that the number of rows has changed...changed from
 what?  

The number of rows has changed from term to term. That is, you have different 
numbers of missing values in each term (= explanatory variable), and when rows 
with missing values are removed for the current model, the accepted 
observations change from term to term. I admit the error message is not the 
most obvious one. I must see where it comes from, and how to make it more 
informative. However, it does give a hint to remove missing values, doesn't 
it?

If you want to have a term-wise test with missing values in terms, you must 
refit your model for complete.cases.  Use argument 'subset' to select a subset 
with no missing values. Currently I don't know any nice short cut to do this 
with the current mode, but the following may work (untested), although it is 
not nice:

keep - rep(TRUE, nrow(tooliken.s)
keep[toolik250.cca$na.action] - FALSE
m2 - update(toolik250.cca, subset = keep)
anova(m2, by=terms, perm=999)

 Is there any way to get around having missing data when running the
 anovas as you can when running the CCA itself?
 
 One other question I have is when I try and run the CCA on all the data in
 my environmental data matrix (toolikenv.s), not just a subset of variables
 as I do above, using this command:
 toolik250.cca -cca(toolikotus250.ra~., toolikenv.s, na.action=na.omit)
 I get the following error message.  Error in svd(Xbar, nu = 0, nv = 0) : a
 dimension is zero What might be causing this error message to be thrown?
 
What does sum(complete.cases(toolikenv.s)) give as a result? Does it give 0?

I suspect you have so many holes that nothing is left when you remove rows with 
any missing values. The message is about an attempt to analyse zero-dimensional 
matrix.

 Thank you so much for your help.  Maybe I will just have to filter out the
 samples with missing environmental data (or filter out some of the variables
 themselves if they have too much missing data), but I was just hoping to
 avoid having to do this.


The functions can handle missing values, but they handle them by removing the 
observation. Do you want to lose a huge number of rows? We won't invent values 
to replace missing data in cca(). Some people have suggested ways to do that, 
and that is not difficult: just search for imputation in R (for instance, 
package mice). However, the real problem is how to compare and summarize the 
multivariate results after imputation. Further, if you have a lot of missing 
values, nothing may be very reliable. It could be possible to collect together 
and combine permutation test results after multiple imputation, but better 
consult a statistician before trying to do this.

Cheers, Jari Oksanen

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Functional response type II question, curve fitting and

2013-05-30 Thread Jari Oksanen
Fernando,

For background reading you may check 

Stevens. M.H.H. (2009) A Primer of Ecology with R. Use R! Series. Springer.

and associated 'primer' package in CRAN. The book discusses functional 
responses.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Pierre THIRIET [pierre.d.thir...@gmail.com]
Sent: 30 May 2013 09:58
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Functional response type II question, curve fitting and

Dear Fernando,

I would suggest the function nls() that fit non-linear model.
You can specify by hand your equation and starting values of your
parameters (your guessed value)
R include some Self-Starting functions for the model the most used, this
could be easier for you to work with it. check ?selfStart()
According to wikipedia, Type II functional response is similar to the
Michaelis--Menten equation, which is implemented in R as self starting
function:http://stat.ethz.ch/R-manual/R-patched/library/stats/html/SSmicmen.html

By adpting examples, you could easily fit your model, if the choosen
equation is suitable.

About statistical comparisons of estimated parameters among curves, I
have no idea but I found, among other this post:
http://stats.stackexchange.com/questions/26611/how-to-test-the-effect-of-a-grouping-variable-with-a-non-linear-model

Good luck,
Pierre

Le 30/05/2013 07:49, Luis Fernando García Hernández a écrit :
 Dear all,

 I am new on the list and on the more complex applications of R, so I ask
 you to excuse me if my mail is too long.

 This time I have to do several tasks related tu functional reponse. I want
 to evaluate the relationship between prey density (x) vs prey consumed(y)
 the idea is to fit the data set to the functional reponse equation:
   Na/TP=a/(1+aThN). I have idea about the value of most of the equation
 parameters, but I am completely lost about how to fit the data points to
 this equation.

 On another hand I have to graph and compare if there are significative
 differences on estimated parameters of the curves for lepidopterans and
 ants, the final data should look like the attacher picture.If any of you
 have some idea about how to do this, would be really appreciated.

 Below you  can find the data and the pictures.

 Thanks in advance

  Prey Density Eaten  Lepidopteran 1 2  Lepidopteran 1 3  Lepidopteran 1 3
 Lepidopteran 3 4  Lepidopteran 3 5  Lepidopteran 3 4  Lepidopteran 5 7
 Lepidopteran 5 5  Lepidopteran 5 6  Lepidopteran 10 8  Lepidopteran 10 10
 Lepidopteran 10 7  Ant 1 3  Ant 1 1  Ant 3 4  Ant 3 2  Ant 5 4  Ant 5 6  Ant
 10 5  Ant 10 6


 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] adonis

2013-05-26 Thread Jari Oksanen
Alan,

A few comments on your procedure. You have two non-standard things in your 
message: you try to do something that looks like post hoc tests, and you use 
non-standard contrasts. There is nothing post hoc in your post hoc tests. What 
you do is that you break your factor variable into separate contrasts. If do 
so, you should carefully read the adonis output which says

Terms added sequentially (first to last)

If your contrasts are correlated, like they are in the example you gave, the 
results for individual terms will depend on the order of terms. Usually people 
associate post hoc tests with multiple testing problem, but there is nothing 
about that in the example you gave. It is just simple testing of individual 
contrasts.

Second point is that you used non-standard contrasts. The species coefficients 
will depend on contrasts and therefore they change. There are easier way of 
doing the same. For instance, you seem to want to have sum contrasts, but with 
different baseline level. Check functions like model.matrix, contrasts, 
relevel, and as.data.frame. However, the magnitude of coefficient also depends 
on specific contrasts that you use.

Cheers, Jari Oksanen

On 24/05/2013, at 16:48 PM, Alan Haynes wrote:

 Hi all,
 
 Im using adonis for some plant community analysis and have been following
 theBioBucket example of how to posthoc tests (
 http://thebiobucket.blogspot.ch/2011/08/two-way-permanova-adonis-with-custom.html
 )
 
 
 
 data(dune)
 data(dune.env)
 ad1 - adonis(dune ~ Management, data=dune.env, permutations=99)
 # Call:
 # adonis(formula = dune ~ Management, data = dune.env, permutations = 99)
 #
 # Terms added sequentially (first to last)
 #
 # Df SumsOfSqs MeanSqs F.Model  R2 Pr(F)
 # Management  31.4686 0.48953  2.7672 0.34161   0.01 **
 # Residuals  162.8304 0.17690 0.65839
 # Total  194.2990 1.0
 # ---
 #  Signif. codes:  0 Œ***‚ 0.001 Œ**‚ 0.01 Œ*‚ 0.05 Œ.‚ 0.1 Œ ‚ 1
 
 man - dune.env$Management
 contmat - cbind(c(1,-1,0,0), c(1,0,-1,0), # construct a new contrast matrix
 c(1,0,0,-1), c(0,1,-1,0),
 c(0,1,0,-1), c(0,0,1,-1))
 contrasts(man) - contmat[,1:4]
 trt1.2 - model.matrix(~ man)[,2]
 trt1.3 - model.matrix(~ man)[,3]
 trt1.4 - model.matrix(~ man)[,4]
 
 ad2 - adonis(dune ~ trt1.2 + trt1.3 + trt1.4 )
 # Call:
 #  adonis(formula = dune ~ trt1.2 + trt1.3 + trt1.4)
 #
 # Terms added sequentially (first to last)
 #
 # Df SumsOfSqs MeanSqs F.Model  R2 Pr(F)
 # trt1.2 10.1483 0.14827  0.8381 0.03449  0.545
 # trt1.3 10.8371 0.83712  4.7321 0.19472  0.001 ***
 # trt1.4 10.4832 0.48321  2.7315 0.11240  0.032 *
 # Residuals 162.8304 0.17690 0.65839
 # Total 194.2990 1.0
 # ---
 # Signif. codes:  0 Œ***‚ 0.001 Œ**‚ 0.01 Œ*‚ 0.05 Œ.‚ 0.1 Œ ‚ 1
 
 
 I was just wondering whether it was fair to say that the species with high
 coefficients (adonis(...)$coefficients) were the ones causing that
 difference?
 
 ad2$coefficients[3,abs(ad$coefficients[3,])1]
 # ElepalPoapraSalrepPoatriElyrepLolperAlogen
 # -1.091667  1.975000 -1.375000  3.28  1.33  3.00  1.65
 
 If so, would it be better to take the coefficients from the original model
 or the model used for the contrast, as these yield different results:
 
 ad1$coefficients[3,abs(ad1$coefficients[3,])1]
 # Rumace   Tripra   Poatri   Plalan
 # 2.316667 1.35 1.516667 1.541667
 
 
 Cheers,
 
 Alan
 
 
 --
 Email: aghay...@gmail.com
 Mobile: +41763389128
 Skype: aghaynes
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] using Pearson's Chi-squared to verify dependence among species distribuion

2013-05-01 Thread Jari Oksanen

On 01/05/2013, at 06:09 AM, Antonio Silva wrote:

 Thanks Zoltan R R-sig list members
 
 I still with some doubts.
 
 Is there a way to calculate in R the results for the equation
 
 X2= n*[|ad-bc|-(p/2)]^2 / [(a+b)*(c+d)*(a+c)*(b+d)]
 
 (see Legendre  Legendre pg 295 eq. 7.6)

This should be doable in vegan::designdist() function. Set 'abcd = TRUE' and 
you can directly use terms like you defined them above. I won't give the 
equation here as I don't know what is the equation you used: Your previous 
message had a different equation than this one here with some mix up with 'n' 
and 'p'. The designdist() function calculates (dis)similarities between rows. 
You must transpose (t()) your data if you want to have dissimilarities between 
columns. The code is pure R so that you can see how to do these calculations 
yourself.

Cheers, Jari Oksanen

 
 for the following data:
 
 
 ST01 ST02 ST03 ST04 ST05 ST06 ST07 ST08 ST09  SP1 1 1 1 1 1 0 0 1 1  SP2 1 1
 0 1 0 1 0 1 0
 
 We have a=4, b=3, c=1, d=1, N=9 and the 2x2 table for this example is
 
 
 
 SP2
 
 
 Presence Absent
 SP1 Presence 4 3
 Absent 1 1
 
 
 
 
 
 Following the Chi square formula I got 110.25 / 280 = 0.3937
 
 I have tried many things, without success.
 
 Thanks in advance for any suggestion.
 
 Antonio Olinto
 
 
 2013/4/25 Zoltan Botta-Dukat botta-dukat.zol...@okologia.mta.hu
 
 Dear Antonio,
 
 Try this:
 
 chisq.test(table(sp1,sp2))
 
 Best wishes
 
 Zoltan
 
 
 
 
 On Wed, Apr 24, 2013 at 6:02 PM, Antonio Silva aolinto@gmail.com
 wrote:
 
 Hi,
 
 I'm trying to use Pearson's Chi-squared to verify the dependence among
 species distribuion.
 
 I have a dataframe with the presence/absence data of two species in a
 number of sample units
 
 The equation I'm using is:
 
 X2= p*(|ad-bc|-p/2)^2 / ((a+b)*(c+d)*(a+c)*(b+d))
 
 where a is the number of double presence (1-1), b is the number of 1-0,
 c
 is the number of 0-1 and d is the number of 0,0 (double absence)
 p is a+b+c+d
 
 Is there a function to caltulate it using R? I could not understand how
 to
 use chisq.test function for this.
 
 Thanks in advance.
 
 Antonio Olinto
 
 
 --
 
 Zoltán BOTTA-Dukát
 
 Institute of Ecology and Botany
 Hungarian Academy of Sciences
 Centre for Ecological Research
 
 H-2163 Vácrátót, Alkomány u. 2-4.
 HUNGARY
 Phone: +36 28 360122/157
 Fax..: +36 28 360110
 botta-dukat.zol...@okologia.mta.hu
 www.okologia.mta.hu
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] envfit and NMDS

2013-04-24 Thread Jari Oksanen
Howdy folks,

This is the second time this week we have this issue. There are two (or three) 
separate points:

(1) You should not correlate environmental variables with axes in any 
ordination method. This applies to PCoA, PCA, CA or anything else just as well 
as to NMDS. You can see this by fitting the vectors: they are rarely parallel 
to the axes. Even in CCA/RDA, the vectors for constraints are rarely parallel 
to the axes.

(2) The ordination space in NMDS is metric. The non-metric part is the 
monotonic (non-metric) regression from metric ordination space to observed 
dissimilarities. The observed dissimilarities between sampling units (plots, 
sites) need not be metric, but they can be semimetric or non-metric, but the 
ordination space derived from them is metric. 

(3) As a separate issue, it is often better to use fitted surfaces than fitted 
vectors. Fitted vectors are appropriate when the fitted surface is a plane 
(first degree linear trend surface). This is rarely the case, and this applies 
to all ordination methods: the fitted surfaces in CA, PCoA or PCA are usually 
just as non-planar as in NMDS.

For point 2: Look at the stressplot(NMDS-result). Here horizontal axis gives 
Euclidean distances in NMDS space -- these are metric. The vertical axis gives 
the observed dissimilarities -- these can be anything. The fit lines gives the 
monotonic regression -- this is non-metric. 

With vegan::metaMDS() the ordination space is not only metric, but it is 
strictly Euclidean. We do and we can rotate the ordination space.

As a historic note, the vector fitting code for vegan was based on a Bell Labs 
document that describes vector fitting for their NMDS (KYST software). The Bell 
folks invented NMDS, and they regarded vector fitting suitable for NMDS from 
the very beginning. That is, form 1960s.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Erin Nuccio [enuc...@gmail.com]
Sent: 24 April 2013 12:30
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] envfit and NMDS

Hello list,

I commonly see envfit used for NMDS, and am curious if envfit is considered a 
non-metric vector fitting tool.  This question came up during a conversation 
with a colleague who only uses envfit with PCoA, because they are concerned 
that to do this would be problematic for the same reason you are not supposed 
to correlate environmental variables with NMDS axes (you can't correlate 
something that's non-metric with a metric variable).  To me, it seems like by 
projecting the metric variable into non-metric space, you're essentially making 
it non-metric, and the correlation would be fine.

If anyone could weigh in and clear up the confusion, that would be great.  
Thanks,
Erin
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf

2013-04-22 Thread Jari Oksanen
I also suggest (like I have suggested before) that you run metaMDS with 
argument plot = TRUE. The convergence criteria in metaMDS are pretty stringent, 
but with plot argument you can see how different the solutions are. Two most 
typical non-convergence cases are that 

(1) most points are stable, but there are a some outliers that don't find their 
place in this universe, and

(2) your data need more dimensions and you should increase 'k'.

Then you should also check the stressplot( ). If the fit line shoots right up 
at the maximum observed dissimilarity, you may need to turn on 'noshare' 
argument in metaMDS to trigger step across dissimilarities. We claim that this 
rarely necessary with the monoMDS engine we use currently, but sometimes it is 
needed.

Without hands on your data it is difficult to guess more.

Cheers, Jari Oksanen


Sent from my iPad

On 22.4.2013, at 22.31, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:

 I would say that it *is* important, in general. However, you don't say
 if you retried running `monoMDS` on the Hellinger transformed data
 (without the Bray-Curtis metric - you should use Euclidean with
 Hellinger transformation)? If you didn't try rerunning with out
 Bray-Curtis and see if it converges. Otherwise, try many more iterations
 and get vegan to start monoMDS from the best solution from the first set
 of runs.
 
 See `?metaMDS for details.
 
 G
 
 On Mon, 2013-04-22 at 08:26 +, Aurélie Boissezon wrote:
 Hello everybody!
 I didn't imagine that my questions will lead to such a debate among 
 researchers :) . It helps me to get ready for future reviewers' comments.  ;)
 Just a question still opened about NMDS (Gavin?):
 Is it important to reach a convergent solution? since the best solution 
 ordinate species always in similar way? Because as I said even with stricter 
 criteria the analysis don't reach a convergent solution.
 
 Best regards,
 
 Aurélie
 
 ---
 Aurélie Rey-Boissezon
 Ph-D Student
 University of Geneva
 Section of Earth and Environmental Sciences - Institute F.-A. Forel
 Aquatic Ecology Group
 Uni Rondeau
 Site de Battelle - Bâtiment D
 7, route de Drize - 1227 Carouge
 Geneva
 Switzerland
 Tel. 0041 (0) 22379 04 88
 
 aurelie.boisse...@unige.ch
 http://leba.unige.ch/team/aboissezon.html
 
 De : fgill...@gmail.com [fgill...@gmail.com] de la part de François Gillet 
 [francois.gil...@univ-fcomte.fr]
 Date d'envoi : samedi 20 avril 2013 10:59
 À : Gavin Simpson
 Cc: Aurélie Boissezon; r-sig-ecology@r-project.org
 Objet : Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf
 
 
 2013/4/19 Gavin Simpson 
 gavin.simp...@ucl.ac.ukmailto:gavin.simp...@ucl.ac.uk
 I really don't see why this has to be an either/or situation.
 
 I fully agree: direct and indirect gradient analyses are complementary! 
 Sorry for not having stressed that in my short answers...
 
 François
 
 
 -- 
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] CCA vs NMDS and ordisurf

2013-04-18 Thread Jari Oksanen
Hello folks,

Only one point here:

On 18/04/2013, at 15:52 PM, Pierre THIRIET wrote:
 
 
 Pay attention in using NMDS. As you said,  it is rank-based, this is why 
 fitting environmental vectors to NMDS biplot is not so appropriate, despite 
 widely done. I don't see the problem about ordisurf and PCoA or CAP: Ordisurf 
 enables you to fit environnemental variables that have non-linear 
 relationships with PC of distance based ordinations.


This is not true. I have seen this sometimes in Internet, but this really is 
not true: The NMDS ordination space is strictly *metric*. In vegan it is even 
strictly *Euclidean*. So it is absolutely correct to fit vectors to NMDS 
ordination. (In MASS::isoMDS you can also have any Minkowski metric, but only 
Euclidean or Minkowski with exponent=2 is allowed in vegan even with isoMDS.)

What is non-metric is the monotonic regression from *metric* ordination to any 
dissimilarity measure. So NMDS finds metric solution from any dissimilarity 
measure.

I

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Conversion of lat/long coordinates to ETRS89

2013-03-20 Thread Jari Oksanen
Daniel,
On 20/03/2013, at 10:45 AM, Levy, Yair wrote:

 Dear Mauricio,
 
 
 I would confirm the use of rgdal::spTransform as appropriate. Also, one could 
 add creating a CRS database as follows:
 
 #European Petroleum Survey Group list creation
 EPSG - make_EPSG()
 
 The geographical ETRS89 CRS is then the following I believe:
 
 #ETRS89 (= EUREF89 = GRS80)
 CRS(+init=epsg:4258)
 
 

Actually ETRS89 / ETRS-TM35FIN for Finland is epsg code 3067.

 Applying spTransform following this function's instructions shouldn't be a 
 challenge I believe. I would strongly recommend comparing your new 
 geographical/cartographical projections the first time you use this function 
 in order to validate their accuracies.
 

The following should work:

library(rgdal)
xy - data.frame(N = 60, E = 27)
coordinates(xy) - ~ E + N
proj4string(xy) - CRS(+proj=longlat)
spTransform(xy, CRS(+init=epsg:3067))

and it is really useful to validate the results...

Cheers, Jari Oksanen (Finland)

 -Oorspronkelijk bericht-
 Van: r-sig-ecology-boun...@r-project.org 
 [mailto:r-sig-ecology-boun...@r-project.org] Namens Mauricio 
 Zambrano-Bigiarini
 Verzonden: mercredi 20 mars 2013 09:29
 Aan: r-sig-ecology@r-project.org
 Onderwerp: Re: [R-sig-eco] Conversion of lat/long coordinates to ETRS89
 
 On 20/03/13 08:33, Daniel Flø wrote:
 Hi,
 
 We have several lat/long coordinates for Finland that we would like to 
 convert into a selected projection (such as ETRS89/ETRS-TM35FIN or other).
 Could someone inform us about what package and function we should use to 
 batch converte, and details in the procedure we should be aware of?
 
 ?rgdal::spTransform
 
 r-sig-geo is a better place for more specific questions.
 
 Kind regards,
 
 Mauricio Zambrano-Bigiarini
 
 --
 =
 Water Resources Unit
 Institute for Environment and Sustainability (IES) Joint Research Centre 
 (JRC), European Commission TP 261, Via Enrico Fermi 2749, 21027 Ispra (VA), IT
 webinfo: http://floods.jrc.ec.europa.eu/
 =
 DISCLAIMER:
 The views expressed are purely those of the writer and may not in any 
 circumstances be regarded as sta- ting an official position of the European 
 Commission
 =
 Linux user #454569 -- Ubuntu user #17469 
 =
 It is not enough to have knowledge;
 one must also apply it (Goethe)
 
 We do not have additional information to the lat/long coordinates, but this 
 is may be not necessary?
 
 
 
 
 
 Daniel Flø
 PhD Fellow, Forest Health
 The Norwegian Forest and Landscape Institute Pb 115, NO-1431 Ås
 Office:   (+47) 64 94 90 28
 Mobile: (+47) 91 14 52 74
 Skype:floe.daniel
 -
 www.skogoglandskap.nohttp://www.skogoglandskap.no/
 -
 
 
  [[alternative HTML version deleted]]
 
 
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] nested.npmanova -- distance matrices as input?

2013-03-11 Thread Jari Oksanen
Erin,

R is open source: you can see the source code if in doubt. 

Looking at the source code, the situation is a bit unclear and depends on the 
details of your data that I don't know. It seems that nested.npmanova() accepst 
R distances structures of class dist. If your Unifrac distances inherit from 
dist class, nested.npmanova(), they will be handled correctly in 
nested.npmanova. If they are, like your write, distance matrices, then they 
will be handled incorrectly: they are accepted silently but treated like they 
were raw data matrices. You should get an error message from vegdist() in that 
case, as it knows no method = FALSE, so it may be that your dissimilarities 
were correctly handled. However, we don't know as we even do not know what 
software (R package, external software) you used in calculating unifrac 
dissimilarities. 

If 'd' are your distances, see what does class(d) say. If says dist (possibly 
with some other alternatives), you are safe. If your 'd' are not of class 
dist, you can try if as.dist(d) changes them to dist.

Cheers, Jari Oksanen

From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
on behalf of Erin Nuccio [enuc...@gmail.com]
Sent: 11 March 2013 07:08
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] nested.npmanova -- distance matrices as input?

Hello list,

Does anyone know nested.npmanova can take distance matrices as input?   When I 
read the helpfile, it specifies that the input for nested.npmanova is a data 
frame, and sounds like distance matrices can only be used for 
nested.anova.dbrda (see below).  However, if I try inputting a Unifrac distance 
matrix, and make method = FALSE, it completes without any errors.  Did this 
complete correctly?

formulaFormula with a community data frame (with sites as rows, species as 
columns and species abundance as cell values) or (for nested.anova.dbrda only) 
distance matrix on the left-hand side and two categorical variables on the 
right-hand side (with the second variable assumed to be nested within the 
first).

Thank you,
Erin
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] question for the R community : Plot RDA biplot without axis ?

2013-02-26 Thread Jari Oksanen
Sarah,

I added argument 'axis.bp' to text.cca and points.cca functions. To upgraded 
functions can be found in http://vegan.r-forge.r-project.org/ (rev2452), and 
will probably be included in the next minor release of vegan (2.0-7) scheduled 
for March, 2013.

You can also get the single files from R-Forge, or you can install devel 
version of vegan with

install.packages(vegan, repos=http://r-forge.r-project.org;)

It will take a day at minimum to get the version packaged in R-Forge.

Cheers, Jari Oksanen

On 26 Feb 2013, at 3:49, Sarah Loboda wrote:

 Hi,
 Here's the reproducible example that I made with dune data. When you do the
 4 graphs, you can see that because of the text () function, there is an
 axis on the right and values appear in the plots on the right side. I
 understand that it is because of my text () function, but is there a way to
 delete that axis in the text funtion? if not, is there another way to plot
 my data on 4 panels without axis?
 
 I don't know what you mean by body of vegan text.cca. You mean in the
 vegan tutorial ?
 I used col.axis because ann=FALSE as an argument in plot function does not
 work and col.axis seems fine...
 
 Thank you very much for your time. I really appreciate your help :D
 
 library(vegan)
 library(MASS)
 
 ### data
 data(dune)
 data(dune.env)
 
 ### Constrained ordination
 dune.hel-decostand(dune, hellinger)
 dune.cca - cca(dune ~ A1 + Manure, data=dune.env)
 
 ### Plot with 4 panels
 par(mfrow=c(2,2))
 par(mar=c(0.3,0.3,0.3,0.3))
 
 
 ### plot 1
 plot(dune.cca, type = n, scaling = 2, col.axis=white)
 with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3,
 col=2))
 ### When I add the next line, it adds env. variables as arrows but also
 adds an axis on the right
 text(dune.cca, display=bp, col=1, cex=1.1)
 
 ###plot 2
 plot(dune.cca, type = n, scaling = 2, col.axis=white, col=grey)
 with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3,
 col=2))
 text(dune.cca, display=bp, col=1, cex=1.1)
 
 ###plot 3
 plot(dune.cca, type = n, scaling = 2, col.axis=white)
 with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3,
 col=2))
 text(dune.cca, display=bp, col=1, cex=1.1)
 
 ###plot 4
 plot(dune.cca, type = n, scaling = 2, col.axis=white)
 with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3,
 col=2))
 text(dune.cca, display=bp, col=1, cex=1.1)
 
 2013/2/25 Gavin Simpson gavin.simp...@ucl.ac.uk
 
 On Mon, 2013-02-25 at 13:18 -0500, Sarah Loboda wrote:
 Hi,
 I have trouble to obtain the ordination graph I want. I want to have 4
 RDA
 biplot on the same page and I don't want to have (or I want to modify)
 the
 axis numbers. I want the marks on the axis without numbers to maximize
 the
 space for each RDA plot.
 
 A problem is the call to text() ( which calls text.cca() ). It doesn't
 pass on arguments to the underlying axis() calls and hence you can't do
 what you are trying to do with that function directly.
 
 Not sure why you want the axis to be white - that draws an axis so it
 will obscure anything drawn before it with white paint.
 
 The only solution at the moment will be to modify the vegan:::text.cca()
 function to change the two calls to axis() at the end of the function
 definition. I suspect you could just copy the body of vegan:::text.cca
 and put it into your own function, but I haven't tried it. If that fails
 due to namespace issues, then use assignInNamespace() to assign your
 function to the text.cca function in vegans namespace.
 
 See the relevant help pages on how to do this. I'm about to leave the
 office so I can't help further now, but if you have trouble email back
 to the list and I'll see about cooking up and example...
 
 All the best
 
 Gavin
 
 This seems like a simple task but I tried different approaches and
 coudn't
 figure out how to change my axis. This is my code :
 
 par(mfrow=c(2,2))
 par(mar=c(0.2,0.2,0.2,0.2))
 
 ### first RDA biplot
 with(arctic, levels(site))
 shapevec- c(19,19,19,19,19,19,19,19,19,19,19,19,6,6,6,6,6,6,6,6,6,6,6,6)
 plot(spiders.rda.a, type = n, scaling = 2, las=1, tcl=0.2,
 col.axis=white)
 with(spiders.env.a, points(spiders.rda.a, display = sites,
scaling = 2, pch = shapevec, cex=1.3))
 text(spiders.rda.a, display=bp, cex=1.1, col.axis=white, ann=FALSE)
  it is when I run this line that my y axis appear on the right but I
 don't want to
 ### in yellow, this is what I tried to make it diseappear. To put those
 arguments in plot() doesn't change anything.
 
 What should I had in the text part to make sure that the axis doesn't
 show
 up?
 My intention is to plot my sites as dots without text and my arrows for
 environmental variables with the name of each variable. Any other ideas
 on
 rda plot will be greatly appreciated.
 Thank you very much :D
 
 
 
 
 
 
 
 -- 
 Sarah Loboda
 MSc candidate, Entomology,
 Department of Natural Resource Sciences
 McGill University
 *http://insectecology.mcgill.ca

Re: [R-sig-eco] Issue with BiodiversityR::nested.npmanova

2013-02-23 Thread Jari Oksanen
Kay,

This should not work if the function is correctly written. You say that the 
terms in your formula are in data=env, but there are no variables env$NFac and 
env$Fac in env -- but there are NFac and Fac.

Cheers, Jari

Sent from my iPad

On 23.2.2013, at 12.15, Kay Cichini kay.cich...@gmail.com wrote:

 Hi all,
 
 Does anyone have a clue why this is not working:
 
 library(BiodiversityR)
 library(vegan)
 
 data(mite)
 mite - mite[1:60,]
 
 env - data.frame(Fac = rep(LETTERS[1:2], each = 30), NFac =
 rep(letters[1:4], each = 15))
 
 nested.npmanova(mite ~ env$Fac + env$NFac, data = env, method = jac,
 permutations = 1000)
 
 Regards,
 Kay
 
 -- 
 
 Kay Cichini, MSc Biol
 
 Grubenweg 22, 6071 Aldrans
 
 Tel.: 0650 9359101
 
 E-Mail: kay.cich...@gmail.com
 
 Web: 
 www.theBioBucket.blogspot.co.athttp://www.thebiobucket.blogspot.co.at/http://www.theBioBucket.blogspot.co.at
 --
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] adonis and temporal changes

2013-02-18 Thread Jari Oksanen

On 18/02/2013, at 14:04 PM, Pierre THIRIET wrote:

 Dear Valérie,
 
 If I remember well, your design includes:
 Isolation categories: 3 levels
 Sites: nested within Isolation categories (10 levels, a total of 30 sites)
 How many replicates per site and time?
 Time:? how many years you have? Only one sampling per year? Within sites and 
 years, samples were random or it is always exactly the same area you sample 
 (e.g. permanent quadrats)?
 
 for adonis, consider that strata is for constraining permutations, which is 
 different than terms in the formulae.
 
Exactly. The 'strata' only influence the permutations and have no effect in 
formula nor effect defined in the formula.

Currently the 'strata' are the only way to constrain the permutations. However, 
in the R-Forge version of vegan and in vegan 2.2-0 (to be released in April) 
you can give a permutation matrix as an input to adonis. You can generate the 
permutation matrix with, say, shuffleSet function of the permute package. This 
allows generation of restricted permutations for instance for time series. 
Vegan command vegandocs(permutations) will open up the vignette of the 
permute package for your inspection, and this will give some examples of 
defining restricted permutations.   At some timeframe we are completely moving 
to the permute package, but you can already use its permutation matrices  as 
input with these new and upcoming versions of vegan from R-Forge.

Cheers, Jari
--
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Fwd: vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande

2013-02-12 Thread Jari Oksanen
Dear Carolina Bello,

You asked this same thing in the general R mailing list, and Brian Ripley 
answered to you on Saturday. The essential things he told you were that you 
cannot do that with 32G of RAM, and that you should rethink your problem. All 
we can do here is to repeat his message, and Tom Philippi already did so. 

With N = 138037 you need 71G to store the result, and 32 G of RAM is too 
little. I don't know how much further you would get with vegan:::vegdist in R 
3.0.0 but at least the error message will change to a Spanish version of 
Error: cannot allocate vector of size 71.0 Gb. 

You really should re-think your problem. You need to use methods that can 
handle large data sets like that or  you need to thin your data. Your data are 
modelled? At least I find it difficult to believe that you really have 
observations on 89 species in 138037 grid cells in rugged terrain like the 
Andes. 

Cheers, Jari Oksanen

On 12/02/2013, at 00:15 AM, Carolina Bello wrote:

 Hi
 I have some problems with the vegdist function.I want to do a hierarchical
 cluster from 138037 pixels of 1 lkm^2 from a study area of colombian Andes.
 I have distributions models for 89 species so i have a matrix with the
 pixels in the rows and species in the columns and is full with
 absence(0)/presence(1) of each species per each pixel. I think the bigger
 problem is that for agglomeration method in the hierarchical cluster i need
 the hole matrix so i can´t divided it.
 
 For doing this I want to calculate a
 distance matrix with jaccard. I have binary data.
 
 The problem is that i have a matrix of 138037 rows (sites) and 89 columns
 (species). my script is:
 
rm(list=ls(all=T))
 
gc() ##para borrar todo lo que quede oculto en memoria
 
memory.limit(size = 10) # it gives 1 Tera from HDD in case ram
 memory is over
 
DF=as.data.frame(MODELOS)
 
DF=na.omit(DF)
 
DISTAN=vegdist(DF[,2:ncol(DF)],jaccard)
 
 Almost immediately IT produces the error:* Error en double(N * (N - 1)/2) :
 tamaño del vector especificado es muy grande*
 
 I think this a memory error, but i don´t know why if i have a pc with 32GB
 of ram and 1 Tera of HDD.
 
 I also try to do a dist matrix whit the function dist from package proxy, i
 did:
 
  library(proxy)
 
vector=dist(DF, method = Jaccard)
 
 it starts to run but when it gets to 10 GB of ram, a window announces that R
 committed an error and it will close, so it closes and start a new section.
 
 I really don't know what is going on and less how to solve this, can anybody
 help me?
 
 thanks
 
 Carolina Bello IAVH-COLOMBIA
 
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/vegdist-Error-en-double-N-N-1-2-tama-o-del-vector-especificado-es-muy-grande-tp4658010.html
 Sent from the R help mailing list archive at Nabble.com.
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Adonis and Random Effects

2013-02-04 Thread Jari Oksanen

On 04/02/2013, at 09:14 AM, Erin Nuccio wrote:

 
 Is adonis capable of modeling random effects?  

No, adonis does not know random effects.

The significance tests in adonis() are based on permutations. The key question 
for random effects is: can you design a permutation matrix that treats some 
variables like they were random and others like they were fixed. If your 
answer is yes, you can have random effects, and if you use the R-Forge 
versions of vegan (now 2.1-25), you can input your own permutation matrices. If 
your answer is I don't know, then there is no way of handling random effects. 

You cannot expect lme4 extensions to formula to work in other packages. We can 
parse many fixed effect formulae to a model matrix, but there is a long way of 
getting permutations to work in some specific way for variables tagged as 
random factors. 

Cheers, Jari Oksanen

 I'm analyzing the impact of a treatment on the microbial community in a 
 split-plot design (2 treatments per plot, 4 plots per grassland, 3 grasslands 
 total). I would like to quantify how much of the variance is due to the 
 Treatment versus the Grassland.  It seems like Grassland should be a random 
 effect, since there are thousands of grasslands, and I'm only looking at 3.
 
 I have tried to use the notation that works with lme4, and it's not working 
 for me (see below for formula and error messages).  If adonis can't do random 
 effects, are there any alternatives?  Or, considering my goal, are there any 
 other programs I should look into?  Any suggestions would be highly 
 appreciated!
 

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Query as db-RDA

2013-01-21 Thread Jari Oksanen

On 22/01/2013, at 08:09 AM, ecology_questi...@yahoo.co.jp
 wrote:
 
 
 I have been carried out a ordination programm, which is db-RDA based on 
 package vagan. I could show the ordination figure. However, I do not find 
 how X and Y values of the respective plots and of arrows of the respective 
 environmental variables were taken. Please teach me the way(s) and 
 programm(s) to get the X and Y values.
 

Hello,

They are extracted from the result object using scores() function. The scores() 
function is described together with the plot() function (see ?plot.cca, 
?scores.cca). The raw scores can be found from the result object (see 
?cca.object), but scores() also scales results as requested (argument 
'scaling').

HTH, オクサネ槍

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

2013-01-16 Thread Jari Oksanen

On 16/01/2013, at 13:50 PM, Duarte Viana wrote:

 Hi Jesse and Gavin,
 
 I remember that post and I think that turning the argument LINPACK of
 the function svd on solved the problem. I did so by:
 
 1. pasting the code of function svd (that can be found in the source code
 of the base package) on the console
 2. changing the argument LINPACK=FALSE to LINPACK=TRUE
 3. re-assigning the function to the base package with the code:
 assignInNamespace(svd, svd, base)
 
 However I do not know whether this is the best solution. Perhaps Gavin
 can find out what the real problem is. I would appreciate if you both
 could keep me informed about any progress.

We have had this problem sometimes in svd in vegan functions (rda, cca), and 
then LINPACK = TRUE has helped. However, this is not a good solution as LINPACK 
is phasing out from R. In the current development version (and probably in the 
next R release later this year), LINPACK is defunct, and this solution won't 
work. 

It seems that the basic problem  is in the svd function of the LAPACK that is 
used as the default now and that will be the only choice later this year. 
LAPACK is an external library that is beyond the control of package developers 
and R core team so that these problems may be unsolvable. It seems that the 
problems with the LAPACK code are so common that even the help page of svd() 
function tells about them:

 Unsuccessful results from the underlying LAPACK code will result
 in an error giving a positive error code (most often ‘1’): these
 can only be interpreted by detailed study of the FORTRAN code but
 mean that the algorithm failed to converge.

Sometimes rescaling of the data (= multiplying with a suitable constant) has 
helped, too, and that may be the only thing we can try once LINPACK = TRUE 
option disappears within a few months.

Best wishes, Jari Oksanen

 
 On Wed, Jan 16, 2013 at 11:35 AM, Gavin Simpson gavin.simp...@ucl.ac.uk 
 wrote:
 Hi Jesse,
 
 Can you send me the data *and* *exact* code you used so I can look into
 this further? I promise to delete the data once I have gotten to the
 bottom of the problem.
 
 If you can, please do so *off list*. If you can't then it might help to
 scale the numbers a bit as range of 5 orders of magnitude may be causing
 some numerical issues with your data.
 
 Note this has nothing to do with vegan; cocorresp is a separate package.
 
 Re the last question; it is possible and IIRC there is some Matlab code
 to do some of this in the supplementary materials for the ter braak 
 schaffers paper. I got some way to implementing this in R but finishing
 it off went to the back burner and I never get back to it since.
 
 All the best,
 
 Gavin
 
 On Tue, 2013-01-15 at 19:09 -0800, Jesse_B wrote:
 Question 1 - It's been a while, so I don't know who will see this, but I am
 having the same issue.  I have count data from two species matrices (fish
 and inverts) and I am trying to run CoCA through cocorresp.  Symmetric CoCA
 works fine, and is the main thing that I need, but I would like to be able
 to switch predictor-response species matrices in a predictive CoCA, to see
 if there are differing patterns of top-down/bottom-up concordance.   I have
 substantial skew in the data, so I have log+1 transformed both sets of data.
 I can run crossval on the raw data (not transformed, 99 samples [33 sites
 over 3 seasons], 72 fish species, 226 invert species, individual numbers are
 in the same ranges, between 0 and 10K for both fish and inverts), but on the
 transformed data, I get the Error in La.svd(x, nu, nv) : error code 1 from
 Lapack routine 'dgesdd'  message consistently on the 5th site.  I am
 comfortable _using_ R and the vegan package in particular, but I am not
 experienced in more deep coding, so I don't have a handle on how to turn
 LINPACK on.  R version 2.15.2, vegan 2.0-4, cocorresp 0.2-0
 
 crossval(fishlog,invertlog)
 LOO - Site: 1 - Complete
 LOO - Site: 2 - Complete
 LOO - Site: 3 - Complete
 LOO - Site: 4 - Complete
 LOO - Site: 5Error in La.svd(x, nu, nv) : error code 1 from Lapack routine
 'dgesdd'
 
 Question 2 - Is it possible to run crossval on matricies for a CCA? to make
 it a PLS-CCA (as in Schaffers et al. 2008) or am I misunderstanding the
 process that they used?
 
 Thanks in advance!
 
 Jesse
 
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/Error-in-La-svd-x-nu-nv-error-code-1-from-Lapack-routine-dgesdd-tp7556369p7577802.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street

Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?

2012-12-14 Thread Jari Oksanen
Claire, Here some small comments
On 13/12/2012, at 17:24 PM, claire della vedova wrote:

 Dear all,
 
 
 
 a)  Which ordination method would be better for my data : PCA knowing
 that the represented inertia is 35.62% or NMDS  with a stress value about
 0.22? 

These numbers cannot be used to say which of these methods is better. You need 
other criteria. Some people may have strong opinions on the choice here, but 
these opinions cannot be based on these numbers -- they are based on something 
else (I do have such an opinion, but I abstain from expressing my opinion).

 
 b)  If NMDS is more adapted which one is the better? with Hellinger
 normalization and Bray-Curtis distance, or with the normalization
 recommended by Legendre and Legendre  and Kulcynski distance ?
 
Hellinger transformation was suggested for Euclidean metric, and normally it is 
used in PCA/RDA (which are based on Euclidean metric although they do not 
explicitly calculate Euclidean distances). I haven't heard of any advantages of 
Hellinger transformation with Bray-Curtis dissimilarity. I suggest you don't 
use it with Bray-Curtis. I don't know if Kulczyński dissimilarity is any better 
than, say, Sørensen dissimilarity (and both seem to be difficult to spell), but 
certainly it belongs to the same group of usually well behaving dissimilarities 
as variants of Bray-Curtis or Jaccard.

 c)   Is there other method to apply? I’m going to try co-inertia with
 ade4 package
 
 
Certainly there is a high number of methods you can apply, but why? What you 
try to analyse? What are your questions?

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Vegan metaMDS: unusual first run stress values with large data set

2012-12-12 Thread Jari Oksanen
Hello R-Community,

First my thanks to Ewan Isherwood who turned our attention to this issue and 
sent his data file to us for analysing the situation. 

It seems that the default convergence criteria are too slack in monoMDS() that 
was the ordination engine of metaMDS() in this case. Good news are that you can 
change those criteria by adding argument 'sfgrmin' to the metaMDS() call (this 
is documented in ?monoMDS). The following command seems to work:

 PSU.NMDS - metaMDS(PSU.sp, k=2, sfgrmin = 1e-7, distance = jaccard)


The default was 'sfgrmin = 1e-5' which was so slack the iteration stopped early 
and did not really converge close to the solution. With this option you can 
find that the correct stress is of magnitude 0.029 which is much lower than 
reported below. Moreover, the stresses of one-dimensional and two-dimensional 
solutions are very close to each other. (There was one outlier (P1763E) which 
only had one species (CHICRA) that occurred only in four other sites and 
distorted the results.)

I advice *against* using 'zerodist = add': it is not needed with monoMDS. 
Identical (distance = 0) sites will have identical scores if you do not use 
this argument. Using 'zerodist = add' is only necessary with MASS::isoMDS() 
that is unable to handle zero distances.

We have changed the default of 'sfgrmin' in http://www.r-forge.r-project.org/ 
so that you should not see this problem in the next vegan releases.

Cheers, Jari Oksanen

On 05/12/2012, at 21:15 PM, Ewan Isherwood wrote:

 Hello, R-Community! This is the first time writing to this group and
 indeed the first time using a mailing list, so please bear with me if
 I’ve done something wrong.
 
 I have a large species x site matrix (89 x 4831) that I want to
 ordinate using metaMDS in the Vegan (2.0-5) package in R (2.15.2). If
 I run this data frame using the Jaccard index in two or more
 dimensions (k1), the first run (run=0) has a relatively low stress
 value and the other 20 runs are much higher and have very low
 deviation. However, k=1 seems to work fine. Furthermore, a
 stress/scree plot reveals a pyramid-like shape, where the k=1 lowest
 stress value is low, increases rapidly for k=2 then decreases slowly
 as k increases.
 
 DimensionsStress
 1 0.1382185
 2 0.1939509
 3 0.1695375
 4 0.155221
 5 0.1406408
 6 0.1294149
 
 I’ve tried this with a small iteration of this data and this issue
 arises at k2 rather than at k1 as it is here. Anyway, this is the
 input and output:
 
 library(vegan)
 library(MASS)
 PSU - read.table(PSU.txt, header = TRUE, sep = )
 PSU.sp - PSU[, 22:110]
 PSU.NMDS - metaMDS(PSU.sp, k=4, zerodist = add, distance = jaccard)
 
 Square root transformation
 Wisconsin double standardization
 Zero dissimilarities changed into  0.0006657301
 Run 0 stress 0.155221
 Run 1 stress 0.2548103
 Run 2 stress 0.255434
 Run 3 stress 0.2551382
 … (Up to run 20 where run 1 through run 20 have all very similar stress 
 values.)
 
 Call:
 metaMDS(comm = PSU.sp, distance = jaccard, k = 4, zerodist = add)
 
 global Multidimensional Scaling using monoMDS
 
 Data: wisconsin(sqrt(PSU.sp))
 Distance: jaccard
 
 Dimensions: 4
 Stress: 0.155221
 Stress type 1, weak ties
 No convergent solutions - best solution after 20 tries
 Scaling: centring, PC rotation, halfchange scaling
 Species: expanded scores based on ‘wisconsin(sqrt(PSU.sp))’
 
 Now, again, with k=1 this does not happen – the solution looks like
 any other regular NMDS run. There are no blank values in the data as
 they are all numbers between 0 and 100 corresponding to % cover, and
 every row and column sum is greater than 0. There are many sites with
 the same species configurations, hence the zerodist, but omitting this
 makes no difference to the problem at hand. The NMDS works fine if I
 use a subset of the data, but I have not subsetted and tested all of
 it. Other metric (Euclidean) and nonmetric (Bray) dissimilarity
 indices result in the same effect. I’ve chosen k=4 here because of the
 (marginal) elbow in the stress plot, but the data itself actually
 looks pretty good at any k value. Even though the output is
 reasonable, I am concerned that hitting the best solution by a large
 amount on the first run means something is messing up, and this
 concern is amplified by the strange pyramid shaped stress plot.
 Because metaMDS uses random starts, I don't see how this output is
 possible. I've scoured the help files and archives of this list and I
 am really now at a loss to explain this.
 
 Thank you in advance for your time and consideration!
 
 Ewan
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org

Re: [R-sig-eco] summing F stats and permutation

2012-11-29 Thread Jari Oksanen
Steve,

The code gives you a vector of F-values calculated separately for each species. 
Taking sum of those vectors gives you the sum of all F-values.

For the ordinary F-values (sum numerators and denominators and get their ratio) 
you can use anova or pemutest functions in vegan.

Cheers, Jari
On 29 Nov 2012, at 18:41, Steve Brewer wrote:

 Jari,
 
 Thanks. This is helpful. I knew that RDA obtained predicted abundances for
 each species, but I didn't know that function rda could be used to
 calculate F statistics for each predictor for each species. That does seem
 like a promising way to approach the problem. So the code you've sent me
 is showing how to extract and sums the numerator and denominator SS and
 then calculate the ratio from those sums and the dfs? Or is it showing how
 to obtain Fs for each species and then summing them?
 
 Incidentally, I was initially very skeptical of the sumF approach until I
 started comparing sumF results to perMANOVA results (using PC-Ord) on
 several different datasets of mine and got strikingly similar results
 between the two techniques. It seems to go against what a lot of
 statisticians are saying about the unique value of multivariate stats. I
 don't have a satisfactory explanation for why it seems to work with my
 data.
 
 Steve
 
 
 J. Stephen Brewer 
 Professor 
 Department of Biology
 PO Box 1848
 University of Mississippi
 University, Mississippi 38677-1848
 Brewer web page - http://home.olemiss.edu/~jbrewer/
 FAX - 662-915-5144
 Phone - 662-915-1077
 
 
 
 
 On 11/29/12 9:43 AM, Jari Oksanen jari.oksa...@oulu.fi wrote:
 
 Steve,
 
 This is R, so it is not about whether this can be done, but how this can
 be done. Unfortunately, doing exactly this requires some fluency in R.
 Doing something similar is very simple.
 
 The description of your problem sounds very much like the description of
 permutation test in redundancy analysis (RDA). The difference is that in
 RDA you sum up nominators and denominators before getting the ratio, but
 in your model you sum up the ratios. So in RDA test you have (num_1 +
 num_2 + ... + num_p)/(den_1 + den_2 + ... + den_p), and in your
 description you have num_1/den_1 + num_2/den_2 + ... + num_p/den_p. The
 former test in canned for instance in the vegan package, but the latter
 you must develop yourself (and the former method of summing variances
 instead of their ratios feels sounder). It would not be too many lines of
 code to fit your code, though. Please note that RDA works by fitting
 linear models for each species independently so that you can get all
 needed statistics from a fitted RDA in the vegan package (function rda).
 The following function extracts F-values by species from a fitted
 vegan:::rda() result object:
 
 spF -
 function (object) 
 {
  inherits(object, rda) || stop(needs an rda() result object)
  df1 - object$CCA$qrank
  df2 - nrow(object$CA$Xbar) - df1 - 1
  num - colSums(predict(object, type=working, model=CCA)^2)
  den - colSums(predict(object, type=working, model=CA)^2)
  (num/df1)/(den/df2)
 }
 
 HTH, Jari Oksanen
 
 From: r-sig-ecology-boun...@r-project.org
 [r-sig-ecology-boun...@r-project.org] on behalf of Steve Brewer
 [jbre...@olemiss.edu]
 Sent: 29 November 2012 16:42
 To: r-sig-ecology@r-project.org
 Subject: [R-sig-eco] summing F stats and permutation
 
 Dear Colleagues,
 
 I'm wondering if anyone in this group has developed code for doing a sumF
 test for examining community responses in an experiment. For those not
 familiar, sumF is a simple univariate alternative to MANOVA and perMANOVA,
 wherein univariate ANOVAs and their associated F statistics are calculated
 for each species' abundance and then the F statistic for each effect is
 summed over all species. The significance of the resulting summed F
 statistic is then evaluated using random permutation. The summed F
 statistic
 is interpreted as an overall community response to the treatments, whereas
 the F statistic for each species provides a measure of the contribution
 that
 species makes to treatment differences.
 
 I could envision a variety of ways in which this could be done in R, but
 I'm
 not adept enough in R to figure out how to do it myself. One possibility
 might involve using permute or shuffle to get the randomized data
 matrices,
 but it is not clear to me how one could simultaneously calculate the
 Anovas
 for all species and sum the resulting F statistics for each random
 permutation. There is no reason why traditional F statistics would have to
 be used. Pseudo-F statistics based on distances for each species'
 abundance
 could be calculated instead and then summed across species.
 
 PLEASE NOTE THAT I AM ALREADY AWARE OF THE OBJECTIONS TO THIS APPROACH TO
 COMMUNITY ANALYSIS. Nevertheless, I am interested in pursuing this using
 R,
 if possible.
 
 Any suggestions are welcomed.
 
 Thanks,
 
 Steve
 J. Stephen Brewer
 Professor
 Department of Biology
 PO Box 1848

Re: [R-sig-eco] post-hoc test for adonis PERMANOVA

2012-11-25 Thread Jari Oksanen
There is a parametric post hoc test for betadisper(), but there is no post hoc 
test for adonis(). These two functions are different and study different null 
models. If anybody figures out how to do post hoc test for adonis() we may 
incorporate submitted code into vegan (though personally I think that the whole 
idea of post hoc tests is weird, to put it mildly).

Cheers, Jari Oksanen

Sent from my iPad

On 25.11.2012, at 0.54, Andre Rovai asro...@gmail.com wrote:

 Dear friends,
 
 I have been recently introduced to R and am dedicating myself to go deeper
 on it. It can be tricky sometimes for a rookie, though!
 
 I ran a PERMANOVA (vegan package) using the adonis() function and tried to
 go deeper in the analysis by running a post-hoc test (betadisper()). I do
 appreciate if someone more familiarized with codes related to the post-hoc
 analysis could validate the way I am headed to. Here are the codes I used:
 
 # permanova
 fl.sem.transf.bray - vegdist(fl, method=bray)
 flfat - read.table(flfator2.txt, header=T, row.names=1)
 adonis(fl.sem.transf.bray ~ loc*trat, data=flfat, strata=flfat$trat,
 permutations=999) # there was significant difference for the interaction
 
 # post-hoc test
 phoc - with(flfat, betadisper(fl.sem.transf.bray, trat))
 TukeyHSD(phoc)
 
 Thank you!
 Andre Rovai
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] ANOSIM

2012-11-25 Thread Jari Oksanen

On 23/11/2012, at 18:44 PM, aristide.andrianarim...@blueline.mg
 aristide.andrianarim...@blueline.mg wrote:

 Dear responsible,
 
 I want to know if ANOSIM (Global R, and p-value) are only use for
 dissimilarity matrix as well as similarity matrix
 
 If I use similarity matrix like Jaccard, is still the same interpretation
 of the output R= completely dissimilar and R = 0 completely similar?
 

Aristide,

If your are using the vegan implementation of anosim, the answer is: no, you 
cannot use anosim() for similarities but only for dissimilarities.

If you refer to the *method* (instead of *implementation*), the answer is: yes, 
of course you can use ANOSIM for similarities. You only need to implement it 
that way (change the signs, reverse the P-values compared to vegan::anosim()). 

The vegan implementation is made for R, and R deals with dissimilarities. Some 
packages may change this R convention, but similarities are not R compliant.

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA based on dissimilarities

2012-08-27 Thread Jari Oksanen
Mauricio,

Nothing reproducible here (we don't have your data sets), not even the output.

In your email you only write about adonis() and the p-values you got from 
adonis. On the other hand, your TukeyHSD() is based on betadisper(), and you 
never seem to check the betadisper() p-values. These two methods are radically 
different: adonis() is about differences between groups, and betadisper() is 
about dispersions within groups. We expect some kind of consistency between 
betadisper() p-values and TukeyHSD() of betadisper, but adonis() is quite a 
different beast which has little to do with betadisper().

If you take an analogy to standard statistics: adonis() is like (M)ANOVA, 
betadisper() is like Levene's test of homogeneity (what ever that means) of 
variances, and TukeyHSD() is its sequel about detecting cases with different 
variances.

I suggest you rather use 999 permutations than 1000: you can get nicer round 
figures. Once we were more patronizing and dropped one permutation if the user 
asked for even hundred or thousand, but then we thought that this is up to user 
to decide and none of our business.

Cheers, Jari Oksanen

On 27/08/2012, at 23:23 PM, Maurício Moraes Zenker wrote:

 Dear all,
 
 
 I‚m using the function „adonis‰ (multivariate ANOVA based on
 dissimilarities) to study beta diversity of moths between altitudinal
 levels and seasons. My analysis is based on the example included in the
 vegan tutorial „Multivariate Analysis of Ecological Communities in R: vegan
 tutorial‰, written by Jari Oksanen. As in the example, the analysis of my
 data indicated that there are significant differences. However, the Tukey‚s
 test indicated that there are no differences between groups? Why does it
 happen? Shouldn‚t the Tukey‚s test show that one group is different from
 another?
 
 The script is almost the same of the tutorial but with more permutations.
 
 
 
 Thank you.
 
 
 
 Mauricio
 
 
 
 Permanova-read.table(Permanova.txt,header=T)
 
 PermanovaII-read.table(PermanovaII.txt,header=T)
 
 betad - betadiver(Permanova, z)
 
 adonis(formula = betad ~ Altitude*Season,data = PermanovaII,
 
 permutations = 1000)
 
 hov - with(PermanovaII, betadisper(betad, Altitude))
 
 TukeyHSD(hov)
 
 hovII - with(PermanovaII, betadisper(betad, Season))
 
 TukeyHSD(hovII)
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] IndVal groups Plotting an RDA with labels

2012-07-03 Thread Jari Oksanen
 minimally edited copied  pasted vegan help page (here 
and all the following code). Don't do this but *follow* the documentation. If 
you give arguments like 'const', 'select', 'arrow.mul', you should give some 
value to those, or you get an error. Normally you do not need to give these 
arguments at all as the functions find defaults internally. You only must give 
them if you want to change those defaults (like 'select' some other cases than 
all), and then you must give the new values.

 
 # would I put: scores(c1.rda, choices=1:2, scaling=2, display=sp) in for
 const
 
This question makes no sense.

It seems that a good idea might be that you sit down and peacefully go through 
some simple introductory example (like that in vegandocs(intro)). Of course, 
there is PC-Ord...

Cheers, Jari Oksanen
 
 
 ## S3 method for class 'cca':
 
 points(c1.rda, display = sites, choices = c(1, 2), scaling = 2,
 
arrow.mul, head.arrow = 0.05, select, const, ...)
 
 
 
 ## S3 method for class 'cca':
 
 scores(x, choices=c(1,2), display=c(sp,wa,cn), scaling=2, ...)
 
 
 
 ## S3 method for class 'rda':
 
 scores(x, choices=c(1,2), display=c(sp,wa,cn), scaling=2,
 
const, ...)
 
 
 
 ## S3 method for class 'cca':
 
 summary(object, scaling = 2, axes = 6, display = c(sp, wa,
 
lc, bp, cn), digits = max(3, getOption(digits) - 3), ...)
 
 
 
 ## S3 method for class 'summary.cca':
 
 print(x, digits = x$digits, head = NA, tail = head, ...)
 
 
 
 ## S3 method for class 'summary.cca':
 
 head(x, n = 6, tail = 0, ...)
 
 
 
 ## S3 method for class 'summary.cca':
 
 tail(x, n = 6, head = 0, ...)
 
 * *
 
 * *
 -- 
 Caitlin Porter
 MSc candidate
 Biology Department
 Saint Mary's University - Halifax
 Office: S320
 (902) 719-4815
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] metaMDS - avoid species overlap in plots

2012-06-25 Thread Jari Oksanen
Hello,

This so frequently asked that it is a FAQ. Try

vegandocs(FAQ)

in R and see question 2.18.

It is also such a typical problem that it is in vegan intro: try

vegandocs(intro)

in R after loading vegan and see chapter 2.1.

Gavin Simpson has a blog article on plots, but I'm now on my phone and can't 
check the web easily.

HTH, Jari Oksanen
 alkuperäinen viesti 
Lähettäjä: Gian Maria Niccolò Benucci
Lähetetty:  25.06.2012, 12:40
Vastaanottaja: r-sig-ecology@r-project.org
Aihe: [R-sig-eco] metaMDS - avoid species overlap in plots

Hi community,

Very simple question. How to avoid overlap plot of species in a metaMDS()
diagram?

I used this command...

 text(metaMDS_new2, display=c(species), cex=0.6)

but some species are plotted one over the other and is not simple to read
the diagram.

Thanks for helping,

--
Gian

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Retrieving canonical coefficients and correlation coefficients according to ter Braak using vegan package cca() function

2012-06-14 Thread Jari Oksanen
Joris,
On 14/06/2012, at 17:39 PM, Joris Meys wrote:

 Hi all,
 
 
 I get results that differ from the table given by ter Braak. I can explain
 the differences in sign (that's normal, as the solution of cca is defined
 up to the sign), and I can believe there would be small differences as
 we're talking different ways of estimation, but the canonical coefficients
 for the second axis differ substantially.
 
 My question :
 
 - Did I use the correct methods to extract what ter Braak calls canonical
 coefficients and correlation coefficients?
 - is there a logical explanation for the big difference in some results ?
 
I haven't checked that paper, and can't do that for several days. However, one 
essential difference in canonical coefficients between Canoco and vegan::cca 
is that Canoco standardizes all constraining variables to unit variance whereas 
vegan uses unstandardized variables. Does it help if you use 

Model - cca(hs.spec ~ ., data = scale(hs.ev))

?

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


  1   2   3   >