Re: [R-sig-eco] twinspan classification rules as narrative
Howdy, TWINSPAN is not in CRAN. It seems that you found it in github. TWINSPAN is an old method, and it seems that people are forgetting how it works. Here some narrative: First, you have defined cut levels to transform your abundance data into binary indicator “pseudospecies”. You give these cut levels in your call. Each species is split by these cut levels into pseudospecies, and that cut level number is added to the name of species. In your example, the indicator pseudospecies at division 8 are actually Cladgray1 and Cladnigr1 where the added ‘1’ just means that the species just occurs, but can have any abundance value: there is no way of knowing its abundance except for the lower limit (>0). In division 1 you have, for instance, Cladmiti4 which means that the species occurs at least at the cutlevel 4: at least at quantity 5, but it can have any value above that limit. Now to the narrative for the division. The rule for division 8 (that you mention in your post) is actually "+Cladnigr1 +Cladgray1 < 1”. So they both are at the lowest cut level 1 (present with any abundance), the ‘+’ sign means that they are both positive indicator values and you add +1 for every plot where they occur. Would the sign be ‘-‘, you would add -1 for each presence to give negative scores. Doing this for all species gives you the indicator score: if both species are present, your score is 2, if one is present, your scores is 1 and if neither is present your score is 0. The condition is ‘< 1’ meaning that if neither is present (score 0), the condition is true and you go to final group 16, but if one or both are present (scores 1 or 2), the condition is false and you continue to division 17. However, this is a tree, and this narrative rule only applies to division 8 and those 16 sampling units it contains: these are split by this rule. To get to this division with this rule you must have satisfied the previous rules leading to this branch. You may see the branch structure using plot(twb): the internal divisions are shown in squared on tree, and the final groups and their sizes as terminal leaves. The classification rules give you only the lower limit of species, and depending on the indicator score threshold, even some of these indicators may be missing in plot. However, you can use function twintable to see the actual cutlevels for each species. These serve as a cover-class values, but do not give any more detail than the cutlevels you defined. Cheers, Jari > On 16 Dec 2019, at 16:44, Gonzalez-Mirelis, Genoveva > wrote: > > Dear all, > > I am trying to understand the results from the twinspan function in the R > package that has been recently developed (also named twinspan). > > Particularly, I would like to be able to derive the classification rules > (indicator species and abundance values, or rather ranges) for each terminal > group of the twinspan classification. > > From this: > > library(twinspan) > data(ahti) > twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75)) > summary(twb) > > I understand that say, for group number 16 (the first terminal group > encountered) the indicator species were Cladnigr and Cladgray. > > I also understand that the indicator score threshold tells me which path to > follow down the tree (left or right). But I struggle to understand just what > the indicator score means (1)? And whether it can be related to the original > abundance value for those two species at the three relevant sites, namely > Ster113, Ster097 and Ster098? > > What would be a narrative way to describe this particular branch of the tree? > > Many thanks in advance, > > Genoveva > > Genoveva Gonzalez Mirelis, Scientist > Institute of Marine Research > Nordnesgaten 50 > 5005 Bergen, Norway > Phone number +47 55238510 > > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] How to get the total variance explained from an envfit object?
Dear Claire Elizabeth Couch, First let me explain how envfit() works: For continuous environmental variables it actually uses ordination to predict the environmental variable. Under the hood (bonnet), it fits a first degree trend surface (plane in 2D) for environmental variable over the ordination scores, and the R2 is the proportion that surface explains of the variable. The arrow shown is the gradient of this fitted trend surface and shows the direction to which the variable changes most rapidly in a first degree linear model. Clearly you cannot add these R2 values, because your environmental variables can be (and normally are) inter-correlated. It seems that you want to work into another direction than envfit: Predict ordination scores by a set of environmental variables. There are many ways of doing this in R, although we do not provide canned tool for this. You can actually do this even with multiple linear model with function lm() of R::stats. Here is how to do this with vegan::rda() function: library(vegan) data(varespec, varechem) ord <- metaMDS(varespec, trace = FALSE) fit <- rda(scores(ord) ~ ., data = varechem) The basic output of this will show you R2: Call: rda(formula = scores(ord) ~ N + P + K + Ca + Mg + S + Al + Fe + Mn + Zn + Mo + Baresoil + Humdepth + pH, data = varechem) Inertia Proportion Rank Total 0.139051.0 Constrained 0.113570.816812 Unconstrained 0.025470.183192 Inertia is variance Here the “Proportion” for the Constrained component is the overall R2 = 0.81681. If you want to see the adjusted R2, this is found with RsquareAdj(fit) $r.squared [1] 0.8168084 $adj.r.squared [1] 0.5318436 However, you only get the overall R2, but not partial R2 values for single variables. You can use anova(fit, by = “margin”) to find the (lacking) marginal significances of unique effects of the variables, though. The regression coefficients can be found with command coef() — and probably you want them normalized: > coef(fit, norm=TRUE) RDA1RDA2 N-0.154511701 0.48579513 P 0.002463991 -0.13179802 ... pH0.701009027 -0.66724274 You can also simplify this model in the usual way, for instance with the function ordistep() that uses permutation test to drop variables one by one (or you can build up this model adding variables one by one if you start with an empty model with ordistep() or ordiR2step()): ordistep(fit)# drop variables m0 <- update(fit, . ~ 1) # m0 is an empty model ordistep(m0, scope=fit) # add variables to an empty model (After these anova(…, by = “margin”) results also give significant effects.) All this sounds a bit weird to me (or more than “a bit”), but it can be done. I guess there are some readers who get hiccups for using RDA on NMDS, but this can be done as the NMDS space is metric (the *transfer* function is non-metric from community dissimilarities to metric ordination space). It also sounds really odd to have an ordination scores as dependent data in RDA, but this exactly answers the problem you presented: predict ordination scores by a set of external variables. After all, RDA is nothing but a linear regression for multivariate response data, and there is no need to think it as an ordination. Cheers, Jari Oksanen On 7 Nov 2019, at 22:02, Couch, Claire Elizabeth mailto:cou...@oregonstate.edu>> wrote: I am analyzing some microbiome data by using unconstrained ordination (PCA or NMDS) followed by environmental vector fitting with the envfit function in the vegan package. The output of envfit includes an r2 value for each vector or factor included in the envfit model, but I am interested in the total amount of variation explained by all the vectors/factors, rather than just stand-alone variables. I presume I cannot simply add up the R2 values assigned to each environmental variable, because there may be overlap in the microbiome variation that is "explained" by each environmental variable. However, there does not seem to be any way of accessing the total r2 value for the model. Using an example dataset, this is what I have tried so far: library(vegan) library(MASS) data(varespec, varechem) library(MASS) ord <- metaMDS(varespec) fit <- envfit(ord, varechem, perm = 999) fit This shows r2 for each environmental variable, but how do I extract the r2 value for the entire model? I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but these all return NULL. I would greatly appreciate any suggestions! [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org
Re: [R-sig-eco] Vegan Function anova.cca: No Significance Codes Legend in Output
Dear Melissa Schindler, It is neither a “glitch” in vegan nor something wrong with your analysis: the legend for significance stars is only shown when there are stars. The showing of stars or the legend is not controlled by vegan, but it is a system-wide setting in CRAN. See help page of ?printCoefmat which explains the usage of stars. Cheers, Jari Oksanen > On 25 Sep 2019, at 20:51, melissa schindler > wrote: > > Hello all, > > I have run a redundancy analysis on Hellinger transformed species abundance > data and environmental variables. > > Once I ran the RDA I proceeded to run a permutation on the resulting rda > overall and by "axis" and the outputs did not provide a legend of > significance codes. > > Even if none of the axes were significant am I correct in assuming the > legend should still be listed in the results of the permutations? > > Is this a glitch in Vegan 2.5-6 or could something be wrong with my > analysis? > > *The following is the output of the anova.cca* > > anova.cca(rda, by="axis") > 'nperm' >= set of all permutations: complete enumeration. > Set of permutations < 'minperm'. Generating entire set. > Permutation test for rda under reduced model > Forward tests for axes > Permutation: free > Number of permutations: 119 > > Model: rda(formula = spe.hel ~ Conduct + DO + pH, data = Zscore_env_var) > Df Variance F Pr(>F) > RDA1 1 0.186195 41.4450 0.2333 > RDA2 1 0.002051 0.4564 0.8833 > Residual 2 0.008985 > *Significance code legend should be listed here with asterisk's denoting > the p-values* > >> anova.cca(rda) > 'nperm' >= set of all permutations: complete enumeration. > Set of permutations < 'minperm'. Generating entire set. > Permutation test for rda under reduced model > Permutation: free > Number of permutations: 119 > > Model: rda(formula = spe.hel ~ Conduct + DO + pH, data = Zscore_env_var) > Df Variance F Pr(>F) > Model 3 0.188246 6.9836 0.225 > Residual 1 0.008985 > > *Significance code legend should be listed here with asterisk's denoting > the p-values* > > > Thanks for your time! > > Melissa > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] transformation of Bray-Curtis in Euclidean
Yes, it is possible, and always has been when I have checked (which is not a proof). You can check this by seeing that it has no negative eigenvalues in principal coordinates analysis (apart from occasional negative almost-zero). Legendre & Legendre book discuss this. Cheers, Jari Oksanen > On 9 May 2019, at 10:21, Irene Adamo wrote: > > Hi all, > > I have a very simple question: is it possible that the square-root of > Bray-Curtis values is Euclidean? if not, is there a way to transform > bray-curtis which is semi-quantitative in Euclidean? > > thanks a lot! > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] standard deviation error for EcoTest.sample
No, you do not need to be worried about that warning. You have rarefaction up to the observed community, and that is always constant, hence you get the warning of zero standard deviation. That should be fixed, but I’m too lazy. Contributions welcome in vegan at https://github.com/vegandevs/vegan/. cheers, Jari Oksanen On 11 Jan 2019, at 20:09, Charlotte Reemts mailto:cree...@tnc.org>> wrote: I am using EcoTest.sample (rareNMtests package) to compare rarefaction curves for 19 vegetation plots on two soil types (alluvial and canyon). The code below produces the following warning (more than 50 times): "In cor(x > 0) : the standard deviation is zero". The test still produces all the expected output. I tried sorting the dataset by SiteType, but still got the error. Curiously, sorting the data produced a significant difference between the site types (p around 0.3), while there was no difference between the unsorted data (p around 0.09). Should I be concerned about the warnings? Why does sorting the dataset make a difference? Thanks, Charlotte rawdata<-read.table(text="Plot SiteTypesp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10sp11sp12sp13sp14sp15sp16sp17sp18sp19 sp20sp21sp22sp23sp24sp25sp26sp27sp28sp29 sp30sp31sp32sp33sp34sp35 2 canyon 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 alluvial1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 5 alluvial1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 6 alluvial1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 7 alluvial1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 8 alluvial1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 10 alluvial1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 0 11 canyon 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 12 canyon 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 13 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 14 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 15 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 16 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 17 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 18 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 19 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 20 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 22 alluvial1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 0 23 alluvial1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 ", header=T) data<-rawdata[,-1] rownames(data)<-rawdata[,1] library(rareNMtests) test.data<-EcoTest.sample(data[,-1], by=data$SiteType, MARGIN=1, trace=F) #error message and no significant difference data2<- data[do.call(order, data),] test.data2<-EcoTest.sample(data2[,-1], by=data2$SiteType, MARGIN=1, trace=F) #error message and significant difference [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] _
Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?
ordered models (which are equivalent to Type II ss tests). And these results will tell you pretty the same thing as type III tests if there is little or no interaction. I would not worry about trying to estimate the main effects while controlling for the interaction (Ellen’s question), which cannot be done using type I or type II SS in 2-way permanova using adonis. But why would you want to? The lack of a balanced design results in the main effects and the interaction not being independent of one another. Forcing that independence by using type III ss can only work by essentially "throwing away" some of the information associated with the main effects, possibly resulting in an overly conservative test. The lower the interaction, however, the less is thrown away and the less it matters. Steve Stephen Brewer jbre...@olemiss.edu<mailto:jbre...@olemiss.edu> Professor Department of Biology PO Box 1848 University of Mississippi University, Mississippi 38677-1848 Brewer web page - https://jstephenbrewer.wordpress.com<https://jstephenbrewer.wordpress.com/> FAX - 662-915-5144 Phone - 662-202-5877 On Oct 31, 2018, at 5:45 PM, Gian Maria Niccolò Benucci < gian.benu...@gmail.com<mailto:gian.benu...@gmail.com>> wrote: Thank you Jari, So to test if there are significant interaction I should use Stage:Growhouse i.e. A:B. This will test the interaction and main effects that are marginal and so removed. How matters then if I include by="margin" or not? The R2 are the same (please see below) but the p-value changes. I assume the second way is most correct, is it? *> adonis2(t(otu_fungi_out) ~ Stage : Growhouse, data=metadata_fungi_out, permutations=)* Permutation test for adonis under reduced model Terms added sequentially (first to last) Permutation: free Number of permutations: adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data = metadata_fungi_out, permutations = ) Df SumOfSqs R2 F Pr(>F) Stage:Growhouse 3 1.0812 0.23075 1.9998 0.0211 * Residual20 3.6045 0.76925 Total 23 4.6857 1.0 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 *> adonis2(t(otu_fungi_out) ~ Stage : Growhouse, data=metadata_fungi_out, by = "margin", permutations=)* Permutation test for adonis under reduced model Marginal effects of terms Permutation: free Number of permutations: adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data = metadata_fungi_out, permutations = , by = "margin") Df SumOfSqs R2 F Pr(>F) Stage:Growhouse 3 1.0812 0.23075 1.9998 0.006 ** Residual20 3.6045 0.76925 Total 23 4.6857 1.0 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Cheers, Gian On Tue, 30 Oct 2018 at 05:47, Jari Oksanen mailto:jari.oksa...@oulu.fi>> wrote: Hello Gian, These formulae expand into different models. Compare model.matrix(~ Stage:Growhouse, data=metadata_fungi_out) model.matrix(~ Stage*Growhouse, data=metadata_fungi_out) The first model (Stage:Growhouse) will also contain (implicitly) main effects and all these terms are marginal and can be removed, whereas the latter Stage*Growhouse expands to explicit main effects and interaction effects, and only the interaction effects are marginal and can be removed. This is also reflected in the degrees of freedom in your anova table: In the first case Stage:Growhouse has 3 df, and in the latter only 1 df (and the main effects ignored had 2 df). Ciao, Giari On 29 Oct 2018, at 19:11, Gian Maria Niccolò Benucci < gian.benu...@gmail.com<mailto:gian.benu...@gmail.com>> wrote: Hello Jari, It is a little bit confusing. If A*B unfolds in A+B+A:B then A:B is the real interaction component. So, which if the code below will test the variance for the interaction alone? adonis2(t(otu_fungi_out) ~ *Stage : Growhouse*, data=metadata_fungi_out, by = "margin", permutations=) Permutation test for adonis under reduced model Marginal effects of terms Permutation: free Number of permutations: adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data = metadata_fungi_out, permutations = , by = "margin") Df SumOfSqs R2 F Pr(>F) Stage:Growhouse 3 1.0812 0.23075 1.9998 1e-04 *** Residual20 3.6045 0.76925 Total 23 4.6857 1.0 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 adonis2(t(otu_fungi_out) ~ *Stage * Growhouse*, data=metadata_fungi_out, by = "margin", permutations=) Permutation test for adonis under reduced model Marginal effects of terms Permutation: free Number of permutations: adonis2(formula = t(otu_fungi_out) ~ Stage * Growhouse, data = metadata_fungi_out, permutations = , by = "margin") Df SumOfSqs R2 F Pr(>F) Stage:Growhouse 1 0.2171 0.04633 1.2045 0.2443 Resid
Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?
Hello Gian, These formulae expand into different models. Compare model.matrix(~ Stage:Growhouse, data=metadata_fungi_out) model.matrix(~ Stage*Growhouse, data=metadata_fungi_out) The first model (Stage:Growhouse) will also contain (implicitly) main effects and all these terms are marginal and can be removed, whereas the latter Stage*Growhouse expands to explicit main effects and interaction effects, and only the interaction effects are marginal and can be removed. This is also reflected in the degrees of freedom in your anova table: In the first case Stage:Growhouse has 3 df, and in the latter only 1 df (and the main effects ignored had 2 df). Ciao, Giari > On 29 Oct 2018, at 19:11, Gian Maria Niccolò Benucci > wrote: > > Hello Jari, > > It is a little bit confusing. If A*B unfolds in A+B+A:B then A:B is the > real interaction component. > So, which if the code below will test the variance for the interaction > alone? > >> adonis2(t(otu_fungi_out) ~ *Stage : Growhouse*, data=metadata_fungi_out, > by = "margin", permutations=) > Permutation test for adonis under reduced model > Marginal effects of terms > Permutation: free > Number of permutations: > > adonis2(formula = t(otu_fungi_out) ~ Stage:Growhouse, data = > metadata_fungi_out, permutations = , by = "margin") >Df SumOfSqs R2 F Pr(>F) > Stage:Growhouse 3 1.0812 0.23075 1.9998 1e-04 *** > Residual20 3.6045 0.76925 > Total 23 4.6857 1.0 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > >> adonis2(t(otu_fungi_out) ~ *Stage * Growhouse*, data=metadata_fungi_out, > by = "margin", permutations=) > Permutation test for adonis under reduced model > Marginal effects of terms > Permutation: free > Number of permutations: > > adonis2(formula = t(otu_fungi_out) ~ Stage * Growhouse, data = > metadata_fungi_out, permutations = , by = "margin") >Df SumOfSqs R2 F Pr(>F) > Stage:Growhouse 1 0.2171 0.04633 1.2045 0.2443 > Residual20 3.6045 0.76925 > Total 23 4.6857 1.0 >> > > The results is clearly very different. Also, in a normal adonis call I > didn't have any significance for the interaction that I have instead if I > use A:B. So ~ A*B will not test for interactions at all? > >> *adonis*(t(otu_fungi_out) ~ Stage * Growhouse, data=metadata_fungi_out, > permutations=) > Call: > adonis(formula = t(otu_fungi_out) ~ Stage * Growhouse, data = > metadata_fungi_out, permutations = ) > > Permutation: free > Number of permutations: > > Terms added sequentially (first to last) > >Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) > Stage10.4877 0.48769 2.7060 0.10408 0.0247 * > Growhouse10.3765 0.37647 2.0889 0.08034 0.0542 . > Stage:Growhouse 10.2171 0.21708 1.2045 0.04633 0.2507 > Residuals 20 3.6045 0.18023 0.76925 > Total 234.6857 1.0 > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> > > Thank you! > > Gian > > > > > > On Tue, 16 Oct 2018 at 08:54, Jari Oksanen wrote: > >> >> >> On 16/10/18 11:23, Torsten Hauffe wrote: >>> "adonis2(speciesdataset~A*B, by="margin") but then only the effect of >> the >>> interaction is tested." >>> >>> This is not entirely correct. >>> adonis2(speciesdataset~A:B, by="margin") would test the interaction >> alone. >>> ~A*B unfolds to ~A+B+A:B >> >> Well, it was correct: the only **marginal** effect in ~A+B+A:B is A:B (A >> and B are not marginal), and by = "margin" will only analyse marginal >> effects. >> >> Cheers, Jari Oksanen >>> >>> On Tue, 16 Oct 2018 at 11:51, Ellen Pape wrote: >>> >>>> Hi all, >>>> >>>> I don't know whether this is the correct mailing group to address this >>>> question: >>>> >>>> I would like to perform a 2-way permanova analysis in R (using adonis in >>>> vegan). By default you are performing sequential tests (by="terms"), so >>>> when you have 2 or more factors, the order of these factors matter. >>>> However, since I wanted to circumvent this, I chose for the option >>>> by="margin" (adonis2(speciesdataset~A*B, by="margin")) but then only the >>>> effect of the interaction is tested. On the "help page" of anova. cc
Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?
vegan::adonis2 only handles marginal effects with by = “margin” (and hence only term A:B of A*B), but RVAideMemoire package has function adonis.II that also can do “type II” and “type III” tests (what ever these mean with adonis) which may be something you are looking for. I haven’t checked how these tests were implemented, but you may do that in your leisure if you think this is what you want to have. Cheers, Jari Oksanen > On 16 Oct 2018, at 14:53 pm, Ellen Pape wrote: > > Hi, > > I know that A*B = A+B+A:B, but in this case, i.e. doing an adonis2 and > specifying by="terms" will only do the test for the interaction, not the > main effects. If one chooses by="terms", you will get a test for the main > effects and the interaction, but than the order of factors matters. > > Best regards > Ellen > > On Tue, 16 Oct 2018 at 12:23, Torsten Hauffe > wrote: > >> "adonis2(speciesdataset~A*B, by="margin") but then only the effect of the >> interaction is tested." >> >> This is not entirely correct. >> adonis2(speciesdataset~A:B, by="margin") would test the interaction >> alone. ~A*B unfolds to ~A+B+A:B >> >> On Tue, 16 Oct 2018 at 11:51, Ellen Pape wrote: >> >>> Hi all, >>> >>> I don't know whether this is the correct mailing group to address this >>> question: >>> >>> I would like to perform a 2-way permanova analysis in R (using adonis in >>> vegan). By default you are performing sequential tests (by="terms"), so >>> when you have 2 or more factors, the order of these factors matter. >>> However, since I wanted to circumvent this, I chose for the option >>> by="margin" (adonis2(speciesdataset~A*B, by="margin")) but then only the >>> effect of the interaction is tested. On the "help page" of anova. cca it >>> says: "if you select by="margin" -> the current function only evaluates >>> marginal terms. It will, for instance, ignore main effects that are >>> included in interaction terms." >>> >>> >>> My question now is: can I somehow get the main effects tested anyhow, when >>> the interaction term is not significant? >>> >>> Thanks, >>> Ellen >>> >>>[[alternative HTML version deleted]] >>> >>> ___ >>> R-sig-ecology mailing list >>> R-sig-ecology@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology >>> >> > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] 2-way adonis (PERMANOVA) incl interaction - how to test for main effects?
On 16/10/18 11:23, Torsten Hauffe wrote: > "adonis2(speciesdataset~A*B, by="margin") but then only the effect of the > interaction is tested." > > This is not entirely correct. > adonis2(speciesdataset~A:B, by="margin") would test the interaction alone. > ~A*B unfolds to ~A+B+A:B Well, it was correct: the only **marginal** effect in ~A+B+A:B is A:B (A and B are not marginal), and by = "margin" will only analyse marginal effects. Cheers, Jari Oksanen > > On Tue, 16 Oct 2018 at 11:51, Ellen Pape wrote: > >> Hi all, >> >> I don't know whether this is the correct mailing group to address this >> question: >> >> I would like to perform a 2-way permanova analysis in R (using adonis in >> vegan). By default you are performing sequential tests (by="terms"), so >> when you have 2 or more factors, the order of these factors matter. >> However, since I wanted to circumvent this, I chose for the option >> by="margin" (adonis2(speciesdataset~A*B, by="margin")) but then only the >> effect of the interaction is tested. On the "help page" of anova. cca it >> says: "if you select by="margin" -> the current function only evaluates >> marginal terms. It will, for instance, ignore main effects that are >> included in interaction terms." >> >> >> My question now is: can I somehow get the main effects tested anyhow, when >> the interaction term is not significant? >> >> Thanks, >> Ellen >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-ecology mailing list >> R-sig-ecology@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology >> > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] distance matrix with chao-jaccard method
vegan::vegdist() function **has** Chao-Jaccard index (method = “chao”). In addition, vegan has function chaodist that is similar to designdist(), but uses the Chao terms (U, V) allowing you to define any Chao distance (see ?chaodist for examples). Vegan has these choices, but I don’t endorse them: it’s all up to your responsibility. cheers, Jari Oksanen > On 14 Oct 2018, at 19:07 pm, Irene Adamo wrote: > > Dear all, > I would like to create a distance matrix based on the similarity > Chao-Jaccard index based on raw abundances in R but so far I have not been > able to find a package that does it. The Vegan package does not have this > option and the dis.chao function of CommEcol creates a dissimilarity > matrix. I hope that you can help find a solution. > > thanks in advance! > Irene Adamo > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] instability of NMDS
Dear Chitra, What do you mean with "extracting scores"? Do you have the result object? In that case, use function scores() to exact scores. Do you mean that you cannot get the result object? If so, have you read the section on "Convergence Problems" in the function documentation (?metaMDS). Have you worked like suggested in that document? What failed when you followed the instructions? Cheers, Jari Oksanen From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Chitra Baniya <cbban...@gmail.com> Sent: 22 November 2017 05:52 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] instability of NMDS *Hi dear R and vegan users,* *I am a R version 3.4.1 user under the Platform: x86_64-pc-linux-gnu (64-bit). My vegan package is 2.4-4.* *Please suggest me best way to extract the sample and species score. Here I got struct with instability of nmds.* *Thank you in advance.* *Chitra * -- *Chitra Bahadur Baniya, M Sc, M Phil, PhDAssociate ProfessorCentral Department of BotanyTribhuvan UniversityKirtipurKathmandu, Nepal* orcid.org/-0002-8746-7601 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Factors in partial RDA part 2
I said earlier that you can use either dummy variables or factors with little difference. However, there are some cases where the differences becomes important: that is the case when you start playing with individual dummy variables like they were real variables. Don't do that! Do not use forward.sel with dummy variables: it makes no sense. Do not look at the VIF values of single factor levels: that rarely makes sense, and they can never suggest removing single levels of factors. It is the whole factor, in our out. Never partly in, partly out. If you go to play like that, you really should switch to standard R way of defining your factor as a factor. The highish VIF values for some levels are probably triggered by correlations with some continuous variables in your model. There should be a generalized VIF in vegan that gives one statistic for the whole factor. Contributions are welcome at http://github.com/vegandevs/vegan. cheers, Jari Oksanen From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Andrew Halford <andrew.half...@gmail.com> Sent: 08 February 2017 10:14 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] Factors in partial RDA part 2 Hi Listers, Further to my last post I am seeking more insights into how to interpret effects of Factors in a RDA analysis. I think of a factor as a single variable whose influence on observed fish distribution patterns I would like to quantify, along with a bunch of other numerical variables. To do the analyses this Factor (called 'geom') is turned into a number of dummy variables (seven actually). The conceptual problem I am having is that when I do a call to vif.cca to check on collinearity for example, the output suggests I should remove some of the dummy variables making up the levels of my Factor. Doing this would leave me with only 3 of the dummy variables out of the original 8 to put into the RDA. I then don't see that I am actually testing the Factor 'geom' anymore but rather just individual variables representing a couple of the different levels of the original Factor. How do I proceed with this? The same conundrum for me is seen when I run the forward.sel command to look at the most efficient number of explanatory variables to have in the final model. The process selects only some of the dummy variables to include in the model. Again I struggle to see how I am testing or including the full effects of the Factor 'geom' if only a few of the dummy variables are actually included in the model. # here is the model run with all the potential explanatory variables ('geom' is the FACTOR with 7 levels) > fish.env <- rda(fish.h~coral_cover+macroalgae+turf_algae_sqrt+ccc_4thrt+rubble_sqrt+reef_slope_sqrt+rugosity +exposure+min_d_sqrt+d_range+chl_a_log+popn_density_4throot+fp+protection +geom,data=env.factor) # collinearity assessment - the results favour dropping 3 of the 'geo' dummy variables leaving only 2 for the model. > vif.cca(fish.env) coral_cover macroalgae turf_algae_sqrt ccc_4thrt rubble_sqrt reef_slope_sqrt 2.688656 3.099972 2.849219 1.771637 2.411291 2.418953 rugosity exposure min_d_sqrt d_rangechl_a_log popn_density_4throot 2.967752 3.433961 2.587696 2.643991 3.626107 4.571781 fp protection geomgeo_bl geomgeo_cbrcgeomgeo_isefrgeomgeo_isprc 4.059624 3.195210 4.329852 12.657270 9.57001512.052385 geomgeo_lefr geomgeo_oefr 7.34781217.090731 # here I have kept all the 'geom' dummy variables to submit to forward selection and it only selects 3 of them, hence it doesnt feel that I am actually including a Factor 'geom' in the model but rather just a few individual dummy variables? > forward.sel(fish.h,env.dummy3,adjR2thresh=R2a.all_fish_env) Testing variable 1 Testing variable 2 Testing variable 3 Testing variable 4 Testing variable 5 Testing variable 6 Testing variable 7 Testing variable 8 Procedure stopped (alpha criteria): pvalue for variable 8 is 0.092000 (superior to 0.05) variables order R2 R2Cum AdjR2CumF pval 1exposure 8 0.07086240 0.0708624 0.04925455 3.279475 0.001 2 fp13 0.04756799 0.1184304 0.07645089 2.266248 0.002 3 geo_isefr16 0.04571706 0.1641474 0.10298750 2.242500 0.003 4 chl_a_log11 0.03812686 0.2022743 0.12250174 1.911778 0.009 5 geo_bl17 0.03423972 0.2365140 0.13863121 1.749016 0.008 6 geo_isprc21 0.03384005 0.2703541 0.15514682 1.762391 0.012 7 reef_slope_sqrt 6 0.03466752 0.3050216 0.17353919 1.845666 0.007 Any advice appreciated cheers Andy -- Andrew Halford Ph.D Research Scientist (Kimberley Marine Parks) Dept. Pa
Re: [R-sig-eco] Factors in partialRDA
Andy, That’s correct: if you have coded your variable as a single factor, R (and hence vegan) will automatically create dummy variables. If you have originally coded your variable as a set of dummy variables, and you have done that correctly, you will get the same result. If you do not drop one of the dummy variables, R will do it for you and tell that some of your original variables were aliased. This will not influence the results, though. In conclusion: everything goes, but it is best to use the R way and define your factor as a single factor. Cheers, Jari Oksanen > On 7 Feb 2017, at 10:48 am, Andrew Halford <andrew.half...@gmail.com> wrote: > > Hi Listers, > > I am running a partial RDA analysis and I have an environmental dataset > which includes a non-ordered factor representing reef type. > > I want to test for a significant linear trend in the environmental dataset > as part of the process of testing for spatial autocorrelation. > > e.g. lineartest <- anova(rda(env,latlong)) > > I have coded this Factor variable up as a group of dummy variables equal to > the number of different reef types to run the test but I am unsure if I > need to drop one of the dummy variables before running the linear test? > > I will revert back to a single Factor variable when I run the RDA as told > in the vegan documentation? > > My reading tells me that the call to RDA in Vegan will automatically deal > with creating dummy variables from a Factor variable presented in the model > but this is not the case > > cheers > > Andy > > > Andrew Halford Ph.D > Research Scientist (Kimberley Marine Parks) > Dept. Parks and Wildlife > Western Australia > > Ph: +61 8 9219 9795 > Mobile: +61 (0) 468 419 473 > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Errors with Simprof for cluster significance
Ansley, I think you may have an old version of clustsig: the current one should not give an error of missing undef.zero. Please check if you need to update. cheers, j.o. > On 7 Oct 2016, at 19:21 pm, Sarah Gosleewrote: > > I can't duplicate your error. Assuming the attached document describes > your data, it works fine. I changed the code to use braycurtis instead > of euclidean, as you wanted, and it does return a warning, just as the > help file documents. > > So you'll need to provide more information. > > > data<- structure(list(necsur = c(1L, 4L, 0L, 8L, 0L, 1L), necame = c(4L, > 5L, 9L, 9L, 4L, 7L), niccar = c(1L, 1L, 1L, 2L, 1L, 4L), nicorb = c(2L, > 20L, 23L, 26L, 3L, 12L), nicpus = c(0L, 0L, 1L, 0L, 0L, 0L), >nictor = c(0L, 2L, 1L, 3L, 2L, 1L), delgib = c(10L, 31L, >47L, 48L, 15L, 55L), cancha = c(5L, 6L, 4L, 4L, 1L, 6L), >melbis = c(3L, 0L, 1L, 3L, 0L, 1L), atelec = c(4L, 6L, 28L, >22L, 8L, 52L), copmin = c(0L, 0L, 1L, 1L, 0L, 1L), ontcon = c(3L, >3L, 11L, 7L, 1L, 2L), ontdep = c(2L, 0L, 0L, 0L, 0L, 0L), >onthec = c(17L, 15L, 9L, 6L, 6L, 2L), ontstr = c(0L, 0L, >0L, 1L, 1L, 0L), onttau = c(20L, 13L, 6L, 2L, 0L, 2L), ontpen = c(2L, >3L, 5L, 3L, 2L, 4L), onttub = c(2L, 3L, 4L, 1L, 1L, 0L), >phavin = c(1L, 0L, 0L, 0L, 0L, 1L), Phyili = c(0L, 0L, 0L, >0L, 0L, 1L), canvir = c(0L, 1L, 0L, 0L, 0L, 0L), hybill = c(1L, >0L, 0L, 0L, 0L, 0L), cyclev = c(0L, 0L, 0L, 0L, 1L, 1L), >galjan = c(0L, 0L, 1L, 1L, 0L, 0L), cyclosig = c(0L, 0L, >0L, 0L, 1L, 0L), omomon = c(1L, 2L, 4L, 10L, 1L, 6L), trofov = c(1L, >2L, 3L, 1L, 0L, 1L), trouni = c(1L, 0L, 0L, 1L, 0L, 0L), >troter = c(0L, 1L, 1L, 0L, 0L, 0L), eusass = c(9L, 8L, 23L, >14L, 11L, 28L), hiscoe = c(2L, 1L, 10L, 4L, 2L, 4L), hisabb = c(0L, >0L, 0L, 0L, 2L, 0L), cremax = c(4L, 7L, 1L, 2L, 2L, 5L), >plamac = c(1L, 0L, 3L, 2L, 2L, 2L), plafem = c(0L, 0L, 0L, >0L, 0L, 0L), plafos = c(1L, 1L, 3L, 2L, 1L, 2L), placom = c(6L, >3L, 3L, 10L, 13L, 7L), tacfim = c(0L, 0L, 1L, 0L, 0L, 0L)), .Names > = c("necsur", > "necame", "niccar", "nicorb", "nicpus", "nictor", "delgib", "cancha", > "melbis", "atelec", "copmin", "ontcon", "ontdep", "onthec", "ontstr", > "onttau", "ontpen", "onttub", "phavin", "Phyili", "canvir", "hybill", > "cyclev", "galjan", "cyclosig", "omomon", "trofov", "trouni", > "troter", "eusass", "hiscoe", "hisabb", "cremax", "plamac", "plafem", > "plafos", "placom", "tacfim"), row.names = c("AP-0", "AP-100", > "AP-200", "AP-300", "ST-0", "ST-100"), class = "data.frame") > > >> simprof(data, num.expected=1000, num.simulated=999, > + method.cluster="average", method.distance="braycurtis", > + method.transform="identity", alpha=0.05, > + sample.orientation="row", const=0, > + silent=TRUE, increment=100, > + undef.zero=TRUE, warn.braycurtis=TRUE) > $numgroups > [1] 2 > > $pval > [,1] [,2][,3] > [1,] -3 -4 NA > [2,] -61 0.461461461 > [3,] -1 -2 NA > [4,] -53 0.905905906 > [5,]24 0.003003003 > > $hclust > > Call: > hclust(d = rawdata.dist, method = method.cluster) > > Cluster method : average > Number of objects: 6 > > > $significantclusters > $significantclusters[[1]] > [1] "ST-100" "AP-200" "AP-300" > > $significantclusters[[2]] > [1] "ST-0" "AP-0" "AP-100" > > > Warning messages: > 1: This version of the Bray-Curtis index does not use standardization. > 2: To use the standardized version, use "actual-braycurtis". > 3: See the help documentation for more information. > > > On Thu, Oct 6, 2016 at 8:43 PM, Ansley Silva wrote: >> Hello all, >> >> I'm using R version 3.3.1, clustsig package. >> >> I have a community dataset and I originally used hclust in the vegan >> package to get dendrograms, however I need significance on the groups that >> were formed. I'd really like to use SIMPROF to look for significance among >> the groups, but I am running into errors. >> >> >> These are the errors I get: >> >>> simprof(apst, num.expected=1000, num.simulated=999, >> method.cluster="average", method.distance="braycurtis", alpha=0.05, >> sample.orientation = "column", const = 0, silent = TRUE,increment=100) >> >> Error: argument "undef.zero" is missing, with no default >> In addition: Warning messages: >> 1: This version of the Bray-Curtis index does not use standardization. >> 2: To use the standardized version, use "actual-braycurtis". >> 3: See the help documentation for more information. >> >>> simprof(apst, num.expected=1000, num.simulated=999, >> method.cluster="average", method.distance="actual-braycurtis", alpha=0.05, >> sample.orientation = "column", const = 0, silent = TRUE,increment=100) >> >> Error in if (denom != 0) { : missing value where TRUE/FALSE needed >> >>> simprof(apst, num.expected=1000, num.simulated=999, >> method.cluster="average", method.distance="braycurtis", alpha=0.05, >> sample.orientation =
Re: [R-sig-eco] raremax value >2, cannot run rarefy
I cannot reproduce anything of this. I cannot see your csv files, and cannot reproduce the analysis. I can see one attached file (raremax.txt) that has a structure that you call "data". When I use that in place of patp (that I don't have), everything runs smoothly. I have no idea what were the warnings you got -- did you use warnings() to see them like the output suggested? Were those warnings? Did they give any useful hints? A few words about the extremes you drove rarefy: if you rarefy your community to one individual, you will have one species (and no error), and if you rarefy to zero individuals have zero species. It never occurred to me to test the function with these absurd values. It is nice to see that zero species with zero error was correctly returned. However, it seems that with rarefaction to one individual the number of species (1) was correctly reported, but the function tried to estimate the standard error in vain with some numerical problems: the standard error should be zero exactly, but the function found zero numerically (about plus/minus 0.0003), and when this numerical zero was negative, you got a warning of taking square root of negative values of magnitude sqrt(-1e-16). You cannot meaningfully rarefy to one individual. The lowest you can do meaningfully is to rarefy to two individuals. You cannot meaningfully ask how many species you have if you have nothing (zero), or if you only have one individual (one). Think about it. Cheers, Jari Oksanen From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Ansley Silva <daily.p...@gmail.com> Sent: 04 August 2016 23:30 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] raremax value >2, cannot run rarefy Hello, R version 3.2.2, using the vegan package. I am using the rarefy function to get rarefied values for my rarefaction curves. Please see the attached data and code. Instead of getting an output from rarefy function of values for S and SE, I am getting the following message: "There were 20 warnings (use warnings() to see them)" It seems that the issue might be that my raremax value for this dataset is equal to 1. When I ran the same code on a dataset with a raremax of 5, everything is okay. When I ran the same code on a dataset with a raremax of 0, my output of S and SE were all 0. I am looking for an explanation for this. Is there a way to get better values for the output? Would PC-ORD do the same thing? Thanks much, -- Ansley Silva *"The clearest way into the Universe is through a forest wilderness." John Muir* *Graduate Research Assistant* *University of Georgia* *D.B. Warnell School of Forestry and Natural Resources* *180 East Green Street* *Athens, GA 30602* ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Follow-up to Vegan metaMDS: unusual first run stress values with large data set
Ewan, You already got some good hints from Peter Michin (outside this list), but here some comments. > On 2 May 2016, at 20:22 pm, Ewan Isherwood <ewan.isherw...@gmail.com> wrote: > > Hi r-sig-ecology! > > This is mostly a message for Jari Oksanen or another Vegan developer that > may be working specifically with metaMDS, but I'm opening up to anyone that > has any interest in this. First of all, you can see my original post here: > http://r-sig-ecology.471788.n2.nabble.com/Vegan-metaMDS-unusual-first-run-stress-values-with-large-data-set-td7577720.html > > Basically, I'm having the same issues with the metaMDS engine as above (R > 3.2.5, Vegan 2.3-5). This time my dataset is larger at 9239 sites x 85 > species. > > I've tried adjusting the sfgrmin value up to an absurd 1e-10,000,000 > (decreasing this value to -7 resolved the issue last time) > > I've tried upping the sratmax to about 0.99 recurring with 77 9's (I don't > think this should have an effect since it's concerned with the iterations > stopping when the stress ratio between two iterations goes above the > inputted value) > Data sets with this size can really be difficult for NMDS because location of any point has tinies effect on the stress. In this kind of situations the first thing is to find why the analysis stops. There are three stopping criteria and you have changed two (steepness of gradient sfgrmin and the change in stress sratmax). Your changes are really absurd since they exceed the numerical accurracy of digital computers. If you launch metaMDS() with argument trace=2 you will get more verbose output that reports the stopping criterion used. If it is always the same criterion, you could make that one stricter (but stay within the numbers digital computers can handle). My first guess is that with data of this dimensions, you very easily stop because you exceed the maximum number of iterations. If that happens, you actually stop before reaching a solution, and you should increase maxit. Only after checking these things you should proceed with looking at the peculiarities of your data ! (disjunct or nearly disjunct subsets etc.) cheers, Jari Oksanen > I've tried using the Jaccard and Bray methods (I don't think this should > have an effect) > > I've trialled 3-6 dimensions randomly (this has in the past affected the > result, but that might be because of other factors) > > I have always used the noshare = TRUE option otherwise it ejects some of > the sampling points with rare species to astronomical values on one or more > axes > > I've tried iterations of this about 20-30 times but it still won't ever > give me a best solution that isn't the first run. Here is the basic code: > > metaMDS(PSU.sp, k= x, distance = "jaccard", sfgrmin = x, sratmax = x, > noshare = TRUE) > > I'm happy to privately share the raw dataset with Jari Oksanen if he's > interested in this phenomenon, but I would have to seek permission for > anyone else since I do not own it. In the meantime I will investigate other > methods to analyse this data, which shouldn't be an issue. Since my dataset > is unusually large for this method, this is probably more for curiosity's > sake for the Vegan developers. > > Thanks for your help, > > Ewan Isherwood > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] specaccum
> On 17 Mar 2016, at 20:21 pm, farga -not old one- <id.fa...@gmail.com> wrote: > > Hello I'm working with a species discovery curve and want to use the Clench > ecuation in said curve, I'm using specaccum {vegan} however I'm not sure if > speccacum use said ecuation, I also require a summary from said ecuation > and both curves, the asymptote curve and the curve from my data, in a plot. > Any ideas or is there any package i should use instead vegan? > I don’t know what is the “Clench equation”, but Google claims that "Clench (1979) proposed the use of the Michaelis-Menten equation”. Michaelis-Menten is available in fitspecaccum() in vegan. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NMDS axes scores
Contrary to common misbelief, NMDS ordination space is **metric**. In vegan, the ordination space (= the ordination result) is even guaranteed to be Euclidean (in isoMDS it can be Minkowski, but this is not allowed with vegan). What is non-metric is the regression from observed dissimilarities to the Euclidean distances in ordination space. The reason why we do not recommend using NMDS axes as independent beasts is that NMDS tries to preserve the *distances* among points. Any orthogonal rotation (= turning of ordination space) will change scores along rotated axes, but retain the distances among points. The vegan NMDS result is rotated to principal components, but still you should avoid thinking that this makes dimensions independent from each other, although the first maximizes the dispersion of points and axes are orthogonal (non-correlated). PCA ordination is Euclidean in the same way as NMDS. The difference to NMDS are that (1) only Euclidean distances among sampling units can be used in PCA (in NMDS you can use any adequate dissimilarity), and (2) the mapping is linear (instead of non-metric) from observed dissimilarities to Euclidean dissimilarities. Try function stressplot() in vegan to see what this means — it is available both for NMDS and rda (PCA) results. CA is similar to PCA except that it is based on weighted Euclidean distances. I won’t go into mathematical details, but you can see ?wcmdscale in vegan to see how to get CA as a weighted Euclidean ordination of Chi-square transformed data. PCA and CA have some ordering criteria for their axis and therefore some people have used axes from those as independent beasts. I think this is dubious, too, but people do it all the time. The PCA/CA also define a multivariate space, and taking only one axis as an independent object sounds strange, in particular if you take something else than the first axes. So what to do with NMDS axes? If you take all NMDS axes and their interactions in a regression of type ~ axis1 + axis2 + axis1:axis2 then this is equal to fitting a linear trend surface, and the interaction term axis1:axis2 takes care that the result is invariant under rotation of NMDS space. Function ordisurf() in vegan gives further ideas how to fit surfaces to NMDS *space* (instead of simple axis). Also, if you think that some direction in NMDS (not necessarily parallel to the axes) is good and you have an indicator variable for that, you can use MDSrotate() function in vegan to rotate your solution to that direction and then take that rotated axis as your explanatory variable. HTH, Jari Oksanen > On 11 Jan 2016, at 10:38 am, Martin Weiser <weis...@natur.cuni.cz> wrote: > > Hi Conny, > > AFAIK NMDS is *non-metric* and represents distances among objects, not > gradients along axes (known or unknown): distances along axes are > stretched as needed locally (NMDS works with rank order), even order of > the elements along axes does not tell anything. NMDS is great if you > want to say: Object A resembles object C more than it resembles object > B, even though C and B are quite similar. > Try this: run NMDS several times, aim for different number of axes (e.g. > 1,2,3,5,10) and note the scores of the objects along the first one. You > *may* get the same thing. > > If you need scores of the objects in the ordination, use something with > well defined metrics and axes, e.g. PCA, CA. > > HTH, > Martin > > On 9.1.2016 05:41, Conny wrote: >> Hi all, >> >> >> >> it has been frequently pointed out in this group, that NMDS axes scores >> shouldn't be used individually for further analysis. >> >> I therefore would like to include both of my NMDS site scores as a response >> into a GLM model simultaneously. Unfortunately, I couldn't find any advice >> on how to actually do this. I found a couple of papers using NMDS scores in >> GLMs, but they all seem to use them individually, fitting separate models to >> each of the ordination axes. >> >> >> >> I'm a bit at a loss here and any advice is very much appreciated, >> >> Conny >> >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-ecology mailing list >> R-sig-ecology@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > > -- > > -- > Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta > Univerzity Karlovy v Praze: > a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu, > b) stanovuje, že smlouva musí mít písemnou formu, > c) vylučuje přijetí nabídky s dodatkem či odchylkou, > d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na &
Re: [R-sig-eco] NMDS axes scores
> On 11 Jan 2016, at 14:13 pm, Bob O'Harawrote: > > > Whilst I'm filling bandwidth, I'm not sure Jari's suggestion that you need > the interaction term is correct. If a model is linear in axis1 and axis2, > then any rotation is also linear, i.e. the transformation is c_1 axis1 + c_2 > axis2. So you only need Y ~ axis1 + axis2. Basically, it's still plane. The > interaction adds a curve: if you want that you also need to include quadratic > terms, i.e. Y ~ axis1 + axis2 + axis1^2 + axis2^2 + axis1:axis2. > Me neither: I think my suggestion to include interaction term was wrong. This belief seems to be shared by vegan authors who define linear trend surface without interaction in ordisurf(). Cheers, Jari O. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NMDS axes scores
Zoltan, You’d better ask Bob… If you really want to get a synthetic (latent) variable with reduced noise, I think you really should be doing Factor Analysis. In particular, you should have confirmatory factor analysis, a.k.a. measurement model in latent linear structural models. Often taking first axes of PCA can do about the same, except that you rarely have sound model construction in PCA. I’m getting more liberal minded with age, and I can accept many kind of analyses, though. Cheers, Jari O. > On 11 Jan 2016, at 17:29 pm, Zoltan Botta-Dukat > <botta-dukat.zol...@okologia.mta.hu> wrote: > > Dear Jari, > > What is your opinion about using first few axes of a metric ordination? I'm > aware that it is meaningless using first two axes of NMDS ordination that > calculated for three dimensions. But in my experience, it is often useful to > use only first few axes of metric ordination instead of raw data: no > ecological relevant information is lost, only the 'noise' is reduced. > > Best wishes > > Zoltan > > 2016.01.11. 11:11 keltezéssel, Jari Oksanen írta: >> Contrary to common misbelief, NMDS ordination space is **metric**. In vegan, >> the ordination space (= the ordination result) is even guaranteed to be >> Euclidean (in isoMDS it can be Minkowski, but this is not allowed with >> vegan). What is non-metric is the regression from observed dissimilarities >> to the Euclidean distances in ordination space. The reason why we do not >> recommend using NMDS axes as independent beasts is that NMDS tries to >> preserve the *distances* among points. Any orthogonal rotation (= turning of >> ordination space) will change scores along rotated axes, but retain the >> distances among points. The vegan NMDS result is rotated to principal >> components, but still you should avoid thinking that this makes dimensions >> independent from each other, although the first maximizes the dispersion of >> points and axes are orthogonal (non-correlated). >> >> PCA ordination is Euclidean in the same way as NMDS. The difference to NMDS >> are that (1) only Euclidean distances among sampling units can be used in >> PCA (in NMDS you can use any adequate dissimilarity), and (2) the mapping is >> linear (instead of non-metric) from observed dissimilarities to Euclidean >> dissimilarities. Try function stressplot() in vegan to see what this means — >> it is available both for NMDS and rda (PCA) results. CA is similar to PCA >> except that it is based on weighted Euclidean distances. I won’t go into >> mathematical details, but you can see ?wcmdscale in vegan to see how to get >> CA as a weighted Euclidean ordination of Chi-square transformed data. >> >> PCA and CA have some ordering criteria for their axis and therefore some >> people have used axes from those as independent beasts. I think this is >> dubious, too, but people do it all the time. The PCA/CA also define a >> multivariate space, and taking only one axis as an independent object sounds >> strange, in particular if you take something else than the first axes. >> >> So what to do with NMDS axes? If you take all NMDS axes and their >> interactions in a regression of type ~ axis1 + axis2 + axis1:axis2 then this >> is equal to fitting a linear trend surface, and the interaction term >> axis1:axis2 takes care that the result is invariant under rotation of NMDS >> space. Function ordisurf() in vegan gives further ideas how to fit surfaces >> to NMDS *space* (instead of simple axis). Also, if you think that some >> direction in NMDS (not necessarily parallel to the axes) is good and you >> have an indicator variable for that, you can use MDSrotate() function in >> vegan to rotate your solution to that direction and then take that rotated >> axis as your explanatory variable. >> >> HTH, Jari Oksanen >> >>> On 11 Jan 2016, at 10:38 am, Martin Weiser <weis...@natur.cuni.cz> wrote: >>> >>> Hi Conny, >>> >>> AFAIK NMDS is *non-metric* and represents distances among objects, not >>> gradients along axes (known or unknown): distances along axes are >>> stretched as needed locally (NMDS works with rank order), even order of >>> the elements along axes does not tell anything. NMDS is great if you >>> want to say: Object A resembles object C more than it resembles object >>> B, even though C and B are quite similar. >>> Try this: run NMDS several times, aim for different number of axes (e.g. >>> 1,2,3,5,10) and note the scores of the objects along the first one. You >>> *may* get the same thing. >&
Re: [R-sig-eco] Morisita horn similarity index
Vegan::vegdist has actually two related indices: Morisita index (“morisita”) that assumes that input data are integers (individuals, not percentages), and its Horn-Morisita (“horn”) variant that does not have this restriction. However, you should always check the formulae of indices, because the names may be used loosely. There is also Horn index that is not implemented in vegan::vegdist. For these indices, vegan follows Krebs’s “Ecological Methodology”. cheers, Jari Oksanen > On 27 Nov 2015, at 15:41 pm, Zoltan Botta-Dukat > <botta-dukat.zol...@okologia.mta.hu> wrote: > > Dear Moses, > > Vegdist supposes that input data are numbers of individuals not percentages. > So, I'm afraid it cannot help to Moses. > > > > > 2015.11.27. 13:17 keltezéssel, Roman Luštrik írta: >> Hi, >> >> `vegdist` function of vegan package >> <http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/vegdist.html> implements >> the morisita index. Is this what you're looking for? >> >> Cheers, >> Roman >> >> >> On Fri, Nov 27, 2015 at 4:55 AM, moses selebatso <selebat...@yahoo.co.uk> >> wrote: >> >>> Hello >>> I am trying to analyse diet overlap (level of similarity) between two >>> species. I have diet composition in %. I have tried to find the best tool, >>> and thought Morisita horn will do, but I cant find the right package for. >>> Is this the best tool? >>> Thank you, >>> Moses >>> [[alternative HTML version deleted]] >>> >>> ___ >>> R-sig-ecology mailing list >>> R-sig-ecology@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology >> >> >> > > > -- > Botta-Dukát Zoltán > > Ökológiai és Botanikai Intézet > Magyar Tudományos Akadémia > Ökológiai Kutatóközpont > > 2163. Vácrátót, Alkotmány u. 2-4. > tel: +36 28 360122/157 > fax: +36 28 360110 > botta-dukat.zol...@okologia.mta.hu > www.okologia.mta.hu > > > Zoltán BOTTA-Dukát > > Institute of Ecology and Botany > Hungarian Academy of Sciences > Centre for Ecological Research > > H-2163 Vácrátót, Alkomány u. 2-4. > HUNGARY > Phone: +36 28 360122/157 > Fax..: +36 28 360110 > botta-dukat.zol...@okologia.mta.hu > www.okologia.mta.hu > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] RDA
> On 26 Nov 2015, at 20:05 pm, Marcelino de la Cruz <marcelino.delac...@upm.es> > wrote: > > El 26/11/2015 a las 16:27, Richards, Christina escribió: >> Hello! >> >> That is very helpful and seems to work! Thank you!! >> >> I did not realize we could use raw data in capscale, > We could and we should! Currently, it is the only way to achieve it. > > is this true only because it is conditioning variables? > It seems that the current implementation of capscale only accepts a single > *data.frame* for both explanatory and conditioning variables > Yes, this is true: current and *future* implementations of capscale/rda/cca (and future dbrda) will only have one data= argument. However, in addition to variables in the data frame given in data=, you can mix variables in the work environment in your formula. If you think you need to have several data frames for the data= argument, please consider cbind(). cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] RDA
> On 26 Nov 2015, at 17:27 pm, Richards, Christina <c...@usf.edu> wrote: > > Hello! > > That is very helpful and seems to work! Thank you!! > > I did not realize we could use raw data in capscale, is this true only > because it is conditioning variables? At any rate it recapitulates our > analysis using partial mantel in 2 species with 2 different results (one > significant, the other not), so I'm inclined to believe its doing something > similar. > If you supply raw community matrix (observations times species), capscale() will internally turn it into dissimilarities using specified distance= method (defaults “euclidean”). You should verify that the selected distance= method is the one you need. The variables on the right-hand-side of the formula must be in “raw” (observations times variables) format. They cannot be distances. Cheers, Jari Oksanen > ___ > From: Marcelino de la Cruz <marcelino.delac...@upm.es> > Sent: Wednesday, November 25, 2015 3:28 AM > To: Richards, Christina; r-sig-ecology@r-project.org > Subject: Re: [R-sig-eco] RDA > > Hi Christina, > > I think this could work: > > You should combine the *raw* y and z (i.e.not the Euclidean matrices) in > the same data.frame (e.g. "yz"), and call capscale like this: > > capscale (x ~y1+y2+...+yn + Condition(z1+z2+...+z80), data=xy) > > > Cheers! > > Marcelino > > > > -- > Marcelino de la Cruz Rot > Depto. de Biología Y Geología > Universidad Rey Juan Carlos > Móstoles España > > > El 24/11/2015 a las 22:11, Richards, Christina escribió: >> >> >> Hello! >> >> >> We are trying to do a partial RDA with 3 matrices but our x matrix has a lot >> of missing data. We could use instead distance matrices which imputed >> missing data, but when we try to use capscale and the euclidean matrices, it >> seems we have to use the formula x~y + condition (z) and we cannot use a >> matrix of values for z. We would like to identify the the effect of y on x >> with z partitioned out where: >> >> >> x= dataframe of dna methylation with individuals listed in column 1 and 0/1 >> data across ~80 columns >> >> y= habitat for each individual >> >> z= dataframe of genetic loci with individuals listed in column 1 and 0/1 >> data across ~80 columns >> >> >> Christina Richards, Ph.D. >> University of South Florida >> Department of Integrative Biology >> 4202 East Fowler Avenue SCA 127 >> NES 107 (shipping) >> Tampa, FL 33620 >> (813)974-5090 >> (813)974-3263 FAX >> http://www.ecologicalepigenetics.com >> Twitter: @EcolEpig >> Facebook: Ecological Epigenetics >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-ecology mailing list >> R-sig-ecology@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology >> > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Differences between total constrained inertia (rda) and variance explained (varpart()) in vegan
Probaby the difference is the adjustment: varpart() uses adjusted R-squared, but rda() output reports unadjusted proportions. RsqureAdj() function gives both. Cheers, jari Oksanen From: R-sig-ecology <r-sig-ecology-boun...@r-project.org> on behalf of Tim Richter-Heitmann <trich...@uni-bremen.de> Sent: 02 November 2015 15:58 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] Differences between total constrained inertia (rda) and variance explained (varpart()) in vegan Dear list, when i perform RDA on a species set constrained with a set of predictors, which i have tested for their relevance by forward selection, i get values for inertia: Inertia Proportion Rank Total 0.126261.0 Constrained 0.054930.43507 11 Unconstrained 0.071330.56493 48 So, the amount of explained inertia is 0.05493/0.1262 or roughly 43%. I am now grouping the same predictors into three groups (soil, vegetation, space) and perform varpart() on them, i get a total variance (sum of all explained variance) explained of only 30%. This has been consistent for me in a variety of datasets. Can somebody explain to me the difference? Thank you! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Quantifying widths of polygons
The standard R tool is ellipsoidhull in the cluster package (which is a recommended R package and should be installed with your R). That will draw a minimum volume ellipsoid enclosing your points. See its documentation for further hints (like MASS::cov.vme). The cluster package also provides volume.ellipsoid function to find the volume. Obviously, these work in 2D. Cheers, Jari Oksanen > On 30 Sep 2015, at 17:38 pm, Baldwin, Jim -FS <jbald...@fs.fed.us> wrote: > > One metric for an "average width" that would be quick to calculate might be > the diameter of a circle that has the same area as the polygon. (Of course, > if the tree crowns are nowhere near circular, this won't likely be a useful > metric.) Maybe there might be a similar approach for finding an ellipsoid > with the same area to deal with your desire for an "eccentricity" metric. (A > fanciful approach might be to perform a principal components analysis on a > grid (or dense random selection) of points in the polygon and use the > "variance explained" for the two principal components to create the > semi-major and semi-minor axes of an ellipse. In any event, the usefulness > of any metric will be based on how well it predicts or is associated with > some other variable or variables of interest.) > > Jim > > > -Original Message- > From: R-sig-ecology [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of > Alexander Shenkin > Sent: Wednesday, September 30, 2015 3:10 AM > To: r-sig-ecology@r-project.org > Subject: [R-sig-eco] Quantifying widths of polygons > > Hello all, > > I am working with data on tree crowns, and this data describes points > (verticies) around the polyhedron of the crown volume (think of the crown as > a single volume with vertices and faces). I can calculate maximum z distance > between any 2 points (maximum depth) and maximum x/y distance (maximum > width). These are useful metrics. I would also like to quantify an > "average" width of the polygon in 2D space (x/y only), as well as a metric > that would describe the "eccentricity" of the polygon. > But, I'm not sure how to go about doing that. > > In general, I've made the polyhedrons and polygons into convex shapes. > > I have considered getting a centroid, intersecting lines every 10 degrees > (for example) going through the centroid with the x/y polygon owin in > spatstat, and then analyzing those line lengths. But, I'm not sure that's > the right way to go, and maybe there are already tools out there to do this. > Any thoughts anyone might have would be very welcome! > > Thanks, > Allie > > (btw, I posted this on R-help (and on R-sig-ecology with no response), and it > was suggested that a list such as this would be more appropriate... apologies > for the cross-post) > > > library(rgl) > library(spatstat) > library(geometry) > x = > c(1.9,-1.4,1.5,1.8,2.2,0.2,0.6,-0.9,-3.7,1.3,-1.9,-3.4,3.7,2.1,-2.0,-1.9) > y = > c(-3.1,3.0,1.1,-1.3,1.0,0.0,1.4,1.6,2.3,-3.6,-1.5,-1.3,0.3,-2.1,0.2,-0.3) > z = c(5.5,4.5,4.3,4.8,6.7,5.8,7.4,6.2,3.5,2.9,4.0,3.7,3.2,3.0,3.1,8.4) > depth = max(z) - min(z) > width_max = max(dist(matrix(c(x,y),ncol=2))) > > xy_win = owin(poly=list(x=x,y=y)) > conv_win = convexhull(xy_win) > # from here, maybe draw lines every 10 degrees through a centroid? > # avg_width = ?? > # eccentricity = ?? > > # 3D plot of crown polyhedron (convex) > ps = data.frame(x=x,y=y,z=z) > crown.surf = t(convhulln(matrix(c(x,y,z),ncol=3))) > open3d() > rgl.triangles(ps[crown.surf,1],ps[crown.surf,2],ps[crown.surf,3],col=heat.colors(nrow(ps)),alpha=.2) > axes3d() > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] calculate dispersion index (betadisper, vegan)
Natalie Zoltan, I'd be more worried about the effect of outliers on the area of convex hulls. Here an _ad hoc_ definition of outlier: outlier a single point that has a huge effect on the area of an enclosing convex hull. That said, two points are certainly on a line, and three points can be (almost) on a line. Four or more points can be on a line, but rarely they are. In vegan, we have function betadisper() that is suggested to be used to measure the dispersion. It may be interesting to compare this to convex hulls. In vegan, summary() of ordihull() result gives the areas of convex hulls (see ?ordihull), and function ordiaretest() runs a permutation test on the areas of convex hulls. Cheers, Jari Okanen On 30/07/2015, at 08:52 AM, Zoltan Botta-Dukat wrote: Hi Natalie, Be careful with convex hull volume, because it is zero if points lie on a line (an near to zero, if the configuration is close to this), independently from the amount of distances. Zoltan 2015.07.29. 19:26 keltezéssel, Natalie írta: hi all I would like to calculate a dispersion index or similar from the results of the vegan function betadisper. i would like to know if there is a possibility to e.g. calculate the volume of the community dispersion(area of the hull in the PCoA) from the betadisper results. common is to show the distance to the centroid.but I have the impression the dispersion within communities are not that clear,especially if you have similar variance. The program primer v5 calculates a disperion index based on the similarity percentage. similarity percentage is not an option since it is mainly based on most counted species. ideas and help are very much appreciated cheers natalie ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Botta-Dukát Zoltán Ökológiai és Botanikai Intézet Magyar Tudományos Akadémia Ökológiai Kutatóközpont 2163. Vácrátót, Alkotmány u. 2-4. tel: +36 28 360122/157 fax: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu Zoltán BOTTA-Dukát Institute of Ecology and Botany Hungarian Academy of Sciences Centre for Ecological Research H-2163 Vácrátót, Alkomány u. 2-4. HUNGARY Phone: +36 28 360122/157 Fax..: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] distances in NMDS ordination space
Hi Kate, I think we should use the name Euclidean NMDS for the kind of NMDS we have in vegan, because its ordination space is strictly Euclidean. There is nothing non-metric in the ordination space. What is non-metric is the transformation of observed dissimilarities to optimize the fit to the Euclidean ordination space. The central term in the squared stress function is SUM (theta(d) - delta)^2 where delta are the Euclidean distances among points in the ordination, d are the observed community dissimilarities among sampling units, and theta() is a non-metric monotone function to transform d. Sum of squared differences is -- by definition -- squared Euclidean distance, and hence our kind of NMDS is Euclidean. The isolated axes are not meaningful *because* the space is Euclidean. Euclidean space is invariant under rotation: you can rotate axes and distances among points does not change, and you can rotate the axes and the configuration of points does not change. Any direction of axes is just as good. In vegan, the default is to rotate axes to principal components so that first dimension is longest. However, you can also rotate a dimension parallel to an environmental variable using function MDSrotate. These rotations do not change the stress, configuration or distances among points. Such a rotated dimension can be more meaningful and there is some justification in using that as a variable in some other analysis. To repeat: vegan NMDS is Euclidean NMDS and the NMDS ordination space is Euclidean. Because it is Euclidean, it is rotation-invariant and any rotation is equally good. Therefore axes do not have a natural orientation in Euclidean space. The only thing that is non-metric is the transformation of community dissimilarities. That non-metric transformation is made to optimize the goodness of fit to Euclidean ordination space. Cheers, Jari Oksanen On 16/07/2015, at 22:19 PM, Kate Boersma wrote: Hi all. I have a methodological question regarding non-metric multidimensional scaling. This is not specific to R. Feel free to refer me to another venue/resource if there is one more appropriate to my question. Correct me if I'm wrong: NMDS axes are non-metric, which is why NMDS frequently makes sense for community data, but it also means that distances in NMDS ordination space cannot be interpreted simplistically as they can in eigenvalue-based methods like PCA. This is why it is inadvisable (meaningless) to use NMDS axes as response variables in a linear modeling framework (e.g., with environmental variables as predictors). My question is this: Does that mean that it is also inadvisable to use distances among points in ordination space as response variables? My (potentially flawed) understanding: While the coordinates may not make sense in isolation, they should be meaningful relative to each other. In a 2D ordination, if communities A B are closer together in ordination space than communities C D, that means they have more similar species compositions. Therefore, I should be able to predict the distance between points in a linear modeling framework. Alternately, I could use the actual distances among communities from my dissimilarity matrix with a method like db-RDA. But I used NMDS over RDA or CCA for a reason. It seems more straightforward to use the distances from my NMDS ordination instead of generating new coordinates from a PCoA to fit an RDA framework (as in db-RDA)... but this logic only works if NMDS distances are informative. Are these comparable analyses? If not, why not? I'd love your opinions. Thank you, Kate -- Kate Boersma, PhD Department of Biology University of San Diego 5998 Alcala Park San Diego CA 92110 kateboer...@gmail.com http://www.oregonstate.edu/~boersmak/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] slope for rarefaction curve
Just for your information: new version of vegan (2.3-0) is now available on CRAN, and it has function rareslope() that finds the slope of the rarefaction curve at given sample size. It can also calculate the slope at at intermediate non-integer sample sizes. Together with this, the latest vegan also adds similar slope calculations for analytic species accumulation models (exact, rarefaction, coleman) plus all non-linear regression models in fitspecaccum() (function specslope()). Cheers, Jari Oksanen On 11/05/2015, at 12:09 PM, Jari Oksanen wrote: Basic algebrra seems to lead to this function: rarederatk - function (x, k) { x - x[x0] J - sum(x) d - digamma(J-k+1) -digamma(J-x-k+1) g - lgamma(J-x+1) + lgamma(J-k+1) - lgamma(J-x-k+1)-lgamma(J+1) d - d*exp(g) sum(d[is.finite(d)]) } Here 'x' must be a vector of species abundances for a single site, and 'k' the sample site at which the derivative is evaluate. Here a simple test that this seems to work (but please check): library(vegan) data(mite) rarecurve(mite[1,]) # first sampling point sum(rarecurve(mite[1,]) # 140, evaluate at 126 individuals y - rarefy(mite[1,], 126) # 19.47011 b - rarederatk(mite[1,], 126) # derivetive 0.04032 (with warnings) abline(y-126*b, b) # matches the rarecurve plot Cheers, Jari Oksanen From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of Zoltan Botta-Dukat botta-dukat.zol...@okologia.mta.hu Sent: 11 May 2015 08:49 To: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] slope for rarefaction curve Dear Simone, Function rarefy uses the function developed by Hurlbert, thus if you need slope in a certain point (as your graph suggests) you can calculate the derivative of this function. It is not an easy job, because factorials should be derived. See cues here how it can be done: http://math.stackexchange.com/questions/300526/derivative-of-a-factorial If you need mean slope in an interval, simply calculate the difference in the calculated values for the beginning and end of the interval, and divide the difference by the length of the interval. Zoltan 2015.05.10. 23:57 keltezéssel, Simone Ruzza írta: Dear all, apologies for the total beginner's question. I was wondering if anyone can give some advice on how to calculate the slope for the last 10% of the records of a rarefaction curve computed with rarefy from vegan. Here is a graphic representation of what I would like to do: https://dl.dropboxusercontent.com/u/33966347/figure.JPG I have seen that this has been done in a recent paper and I was wondering if anyone may have any code snippet to do that. Sorry, maybe this is something really obvious but I have not quite understood how to do it. thanks! Simone ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Botta-Dukát Zoltán Ökológiai és Botanikai Intézet Magyar Tudományos Akadémia Ökológiai Kutatóközpont 2163. Vácrátót, Alkotmány u. 2-4. tel: +36 28 360122/157 fax: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu Zoltán BOTTA-Dukát Institute of Ecology and Botany Hungarian Academy of Sciences Centre for Ecological Research H-2163 Vácrátót, Alkomány u. 2-4. HUNGARY Phone: +36 28 360122/157 Fax..: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] slope for rarefaction curve
Basic algebrra seems to lead to this function: rarederatk - function (x, k) { x - x[x0] J - sum(x) d - digamma(J-k+1) -digamma(J-x-k+1) g - lgamma(J-x+1) + lgamma(J-k+1) - lgamma(J-x-k+1)-lgamma(J+1) d - d*exp(g) sum(d[is.finite(d)]) } Here 'x' must be a vector of species abundances for a single site, and 'k' the sample site at which the derivative is evaluate. Here a simple test that this seems to work (but please check): library(vegan) data(mite) rarecurve(mite[1,]) # first sampling point sum(rarecurve(mite[1,]) # 140, evaluate at 126 individuals y - rarefy(mite[1,], 126) # 19.47011 b - rarederatk(mite[1,], 126) # derivetive 0.04032 (with warnings) abline(y-126*b, b) # matches the rarecurve plot Cheers, Jari Oksanen From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of Zoltan Botta-Dukat botta-dukat.zol...@okologia.mta.hu Sent: 11 May 2015 08:49 To: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] slope for rarefaction curve Dear Simone, Function rarefy uses the function developed by Hurlbert, thus if you need slope in a certain point (as your graph suggests) you can calculate the derivative of this function. It is not an easy job, because factorials should be derived. See cues here how it can be done: http://math.stackexchange.com/questions/300526/derivative-of-a-factorial If you need mean slope in an interval, simply calculate the difference in the calculated values for the beginning and end of the interval, and divide the difference by the length of the interval. Zoltan 2015.05.10. 23:57 keltezéssel, Simone Ruzza írta: Dear all, apologies for the total beginner's question. I was wondering if anyone can give some advice on how to calculate the slope for the last 10% of the records of a rarefaction curve computed with rarefy from vegan. Here is a graphic representation of what I would like to do: https://dl.dropboxusercontent.com/u/33966347/figure.JPG I have seen that this has been done in a recent paper and I was wondering if anyone may have any code snippet to do that. Sorry, maybe this is something really obvious but I have not quite understood how to do it. thanks! Simone ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Botta-Dukát Zoltán Ökológiai és Botanikai Intézet Magyar Tudományos Akadémia Ökológiai Kutatóközpont 2163. Vácrátót, Alkotmány u. 2-4. tel: +36 28 360122/157 fax: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu Zoltán BOTTA-Dukát Institute of Ecology and Botany Hungarian Academy of Sciences Centre for Ecological Research H-2163 Vácrátót, Alkomány u. 2-4. HUNGARY Phone: +36 28 360122/157 Fax..: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] vegan: goodness() and pRDA
This seems to be a bug in vegan. In fact, there are both bugs and bad design decisions. I have pushed a pull request to https://github.com/vegandevs/vegan which fixes the issues in this message plus a couple of other quirks. I'll test that fix a bit, and if everything looks OK, it will be merged and included in the next vegan release. Thanks for reporting this. Best wishes, Jari Oksanen From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of Christoph Eberhard Freiherr von Redwitz christoph.redw...@uni-rostock.de Sent: 05 March 2015 15:29 To: R-sig-ecology@r-project.org Subject: [R-sig-eco] vegan: goodness() and pRDA Hi all! I am using rda() in vegan and goodness() as a diagnostic tool. I want to know how good the species fit to the first two ordination axes. But I got problems combining partial RDA and goodness(). The species sum of CA and CCA don't add up to 1 (as it does in RDA without Condition term). Furthermore, I get warnings when only one axis is left over in the result of goodness. Here an example: #--# data(dune) data(dune.env) mod1 - rda(dune~A1 + Moisture + Management, data=dune.env) mod2 - rda(dune~ Manure + Condition(A1 + Moisture + Management), data=dune.env) mod3 - rda(dune~A1 + Condition(Manure + Moisture + Management), data=dune.env) goodness(mod1, mod= CA ,sum=T) + goodness(mod1, mod= CCA ,sum=T)#result as expected goodness(mod2, mod= CA ,sum=T) + goodness(mod2, mod= CCA ,sum=T)#only 4 RDA axes left over, they don't end up with 1 #here comes the warning: goodness(mod3,mod= CA ,sum=T)#works goodness(mod3,mod= CCA,sum=T) #only 1 RDA axis left. Warning messages. The sum option seems to be affected. #--# Two questions: Is the warning because of wrong usage of the method? Can I use goodness with partial RDA? 1 - goodness(mod3,mod= CA ,sum=T) should do it in this case? I hope the mail is clear and not an old story. Greetings! Christoph [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] CAPSCALE plot output
Alex, It is possible that the documentation in vegan answers this. You can try vegandocs(decision) and go to chapter Scaling in redundancy analysis that also applies to capscale results. Cheers, Jari Oksanen From: R-sig-ecology r-sig-ecology-boun...@r-project.org on behalf of Alexander van den Bos alexander.vanden...@hutton.ac.uk Sent: 17 December 2014 13:02 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] CAPSCALE plot output Hello, I have a quick question about the plot()function in capscale. This appears to plot the site scores from the output, but on closer inspection I've noticed that the values are not exactly the same as those produced by summary() Does anyone have an explanation for this? Perhaps there is some scaling performed on the data before plotting it Thanks Alex If you are not the intended recipient, you should not read, copy, disclose or rely on any information contained in this email, and we would ask you to contact the sender immediately and delete the email from your system. Although the James Hutton Institute has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and any attachments. The James Hutton Institute is a Scottish charitable company limited by guarantee. Registered in Scotland No. SC374831 Registered Office: The James Hutton Institute, Invergowrie Dundee DD2 5DA. Charity No. SC041796 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Community distance matrix deconstruction
Kate, Your question really may need some clarification, but at the moment it looks to me that you want to have row indices and column indices for your dissimilarities, and information about within/between dissimilarities. If this is what you want to have, it is an easy task. In the following I use a real data set from vegan to make this task a bit more general: library(vegan) data(mite, mite.env) ## dissimilarities d - dist(mite) ## row and column indices row - as.dist(row(as.matrix(d))) col - as.dist(col(as.matrix(d))) ## within same class: 1 = within, 0 = between within - with(mite.env, as.dist(outer(Shrub, Shrub, ==))) ## data frame -- the pedestrian way: snappier alternatives possible df = data.frame(row=as.vector(row), col=as.vector(col), within=as.vector(within), dist=as.vector(d)) ## see it tail(df) tail(df) # row col within dist #2410 68 67 1 691.69502 #2411 69 67 0 716.93863 #2412 70 67 0 700.60973 #2413 69 68 0 78.08329 #2414 70 68 0 24.24871 #2415 70 69 1 67.86015 I don't think you really want to have this: you only believe that you want to have this (mauvaise foi, like they used to say). If you only want to get summaries, check function meandist in vegan. Cheers, Jari Oksanen On 13/12/2014, at 02:17 AM, Kate Boersma wrote: Hi all. I have a community analysis data manipulation puzzle for you... hopefully someone can help. Please let me know if this question needs clarification, has previously been answered, or would be better sent to a different list. Details follow. Thank you, Kate --- Here is a simplified version of my problem: I ran a community manipulation experiment with 7 reps of 2 treatments, for a total of 14 communities. Communities 1-7 are in Treatment 1 and 8-14 are in Treatment 2. I identified 5 taxa in the 14 communities and calculated a community dissimilarity matrix (14*14). Now I would like to decompose the distance matrix into a dataframe with the following column headings: community1s, community2s, withinORbetweenTRT, and distance. “Within or between treatment” indicates if the distance is between two communities within the same treatment or between the two treatments (values of 0 or 1). I did it by hand below to demonstrate, but my actual dataset has 100 communities so I need to figure out how to automate it... df-data.frame(cbind(1:14, 18:5, 3:16, 14:1, 16:3)) #random values dist-dist(df) distance-as.vector(dist) community1s-c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7, 8,8,8,8,8,8,9,9,9,9,9,10,10,10,10,11,11,11,12,12,13) community2s-c(2,3,4,5,6,7,8,9,10,11,12,13,14,3,4,5,6,7,8,9,10,11,12,13,14, 4,5,6,7,8,9,10,11,12,13,14,5,6,7,8,9,10,11,12,13,14, 6,7,8,9,10,11,12,13,14,7,8,9,10,11,12,13,14, 8,9,10,11,12,13,14,9,10,11,12,13,14,10,11,12,13,14, 11,12,13,14,12,13,14,13,14,14) #now I need a column for whether or not the comparison is within treatment or #between treatments. I ordered the sites by treatment so sites 1-7 are in treatment1 #and 8-14 are in treatment2. 0 is within and 1 is between. withinORbetweenTRT-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0, 1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0, 1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0) #now I can assemble the dataframe: final.df-cbind(community1s, community2s, withinORbetweenTRT, distance) final.df I would appreciate any ideas! -- Kate Boersma, PhD Department of Biology University of San Diego 5998 Alcala Park San Diego CA 92110 kateboer...@gmail.com http://www.oregonstate.edu/~boersmak/ Kate S. Boersma, Ph.D. kateboer...@gmail.com http://people.oregonstate.edu/~boersmak/ Department of Biology University of San Diego 5998 Alcala Park San Diego, CA 92110 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] help radfit()
Negative values of what? If it is AIC, it is not a problem. With AIC, only differences are meaningful. The observed values can be negative. Cheers, Jari Oksanen Sent from my iPad On 29.11.2014, at 19.56, Manoeli Lupatini mlupat...@gmail.com wrote: Dear all, I am working with radfit() function in vegan with a df which contains a big amount of zeros. When is not zero, I have numbers with comma, which represent proportions. I am using the family Gamma. I got negative values for some models. Below an example of the output: Null Preemption LognormalZipf Mandelbrot 37 694.8004 -413.17581 -565.23622 492.945299 -482.93299 38 415.2763 -340.93024 -363.86510 469.338760 -379.28155 39 1221.0254 -310.94620 -1490.60959 1131.261212 -342.56346 40 1448.9174 -328.61790 -794.12383 761.840479 -439.40301 41 1007.8350 -376.34689 -493.21855 455.072273 -486.60574 Someone can explain if is a problema to get this negative values? Thanks for your help, Manoeli -- Manoeli Lupatini Eng. Agrônoma Ms. em Ciência do Solo, Doutoranda em Ciência do Solo Programa de Pós-graduação em Ciência do Solo Departamento de Solos, UFSM, Santa Maria (55)81262295 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Minimum Number of Observations for pcaCoDa?
Rich, It seems that the robust covariance matrix (? I assume it is that) is not non-negative definite... Function robCompositios::pcaCoDa seems to use function princomp of base R (or its stats package) as the engine to get the principal components. If that function is used for raw data, it stops with error message ('princomp' can only be used with more units than variables) if the number of columns is larger than the number of rows. However, it seems that it may still be able to handle these cases if you use covariance matrix, and the last (ncol nrow) eigenvalues will be numerically zero -- that is: the covariance matrix is non-negative definite. Normal covariance matrices normally satisfy this (with provision of numerical precision), but it seems that the robust covariance matrix does not. Actually, the warning is very clear and says: n 2 * p, i.e., possibly too small sample size. The condition I put above was only that n p, but this seems to require that the number of rows is two times higher than the number of columns. Because this was not case, the warning came true and you got an error. So yes, you need more data if you wish to use this tool. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org on behalf of Rich Shepard rshep...@appl-ecosys.com Sent: 21 November 2014 00:08 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] Minimum Number of Observations for pcaCoDa? The compositional data sets have few observations: 4 to 7 rows each, but there are 5 parts (columns) for each row. I tried to use the robCompositions function pcaCoDa(). There was an error and warning generated: ( winters.biplot - pcaCoDa(winters.coda) ) Error in princomp.default(xilr, covmat = cv, cor = FALSE) : covariance matrix is not non-negative definite In addition: Warning message: In covMcd(xilr, cor = FALSE) : n 2 * p, i.e., possibly too small sample size The matrix for winters.code has the counts: filter gather graze predate shred 1 3 27 3 11 1 2 3 28 3 13 2 3 3 43 7 15 1 4 4 54 6 24 3 5 3 26 4 22 5 6 1 39 2 18 2 Same results with the data file winters.acomp: filtergather graze predate shred [1,] 0.0667 0.600 0.0667 0.244 0.0222 [2,] 0.06122449 0.5714286 0.06122449 0.2653061 0.04081633 [3,] 0.04347826 0.6231884 0.10144928 0.2173913 0.01449275 [4,] 0.04395604 0.5934066 0.06593407 0.2637363 0.03296703 [5,] 0.0500 0.433 0.0667 0.367 0.0833 [6,] 0.01612903 0.6290323 0.03225806 0.2903226 0.03225806 attr(,class) [1] acomp Is there a minimum number of observations for PCA or was I using the incorrect data format? Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Logistic regression with 2 categorical predictors
Andrew, If the 24 rows are the data you are analysing, I cannot replicate any of your significant results within that glm framework *if* I take into account the overdispersion. The full model with age*test interaction is saturated and cannot be analysed at all, but the main effects model age+test can be analysed (or either term separately). However, the results are overdispersed, and you should use family=quasibinomial instead of family=binomial, and then use test=F instead of test=Chi: anova(glm(cbind(prefer,avoid) ~ age+test, data=datafromthemail, family=quasibinomial), test=F) Analysis of Deviance Table Model: quasibinomial, link: logit Response: cbind(prefer, avoid) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Pr(F) NULL23 54.908 age 5 11.235218 43.673 0.9844 0.4591 test 3 1.593415 42.079 0.2327 0.8722 As you see, the Resid. Dev is much larger than Resid. Df for both terms in this sequential model, and therefore you must use quasibinomial models and F-tests -- and these give similar results as other tests. I could not get any results for the saturated models, and one of your examples (below in this thread) seemed to use only one level of test and only *six* observations: it was also saturated as you had no replicates for your six age levels. You need replicates. However, I'm not sure I understood your data correctly. It looks like you have a certain number of animals, but their number is reduced with age so that you have a kind of censored data (animals not available in all cases). Perhaps somebody can propose a better analysis for such a censored data, if it is like that. Cheers, Jari Oksanen On 24/10/2014, at 10:41 AM, Andrew Halford wrote: Dear Gavin, Firstly let me say that I take offence at your bogus comment. Just because I, like many others who interact on this list, often struggle conceptually with the overwhelming analysis choices that are required in our line of work doesn't give you the right to drop snide remarks as you see fit. My line of query is ALWAYS genuine from my perspective and I don't expect you or anyone else to belittle people on the public list! As it turns out my issues are not resolved. To recap.. I have run a bunch of choice chamber experiments with larval fish. Graphing up the ratio of 0/1 choices produces a plot which shows to my eye, evidence of a result for some of the tests, with fish appearing to make defined choices in the later age groups for 2 of the tests. What appears to be happening is that because there are some empty cells in the later age x test interactions (the fish only took one option to the exclusion of the other) the errors are way out and hence preclude any chance of getting a significant result. If I add a single result to any of the zero cells to remove the blank the analysis actually works more as I hoped. However I doubt this is acceptable so I am hoping to get some help with producing an effective analysis without having to manipulate the blank cells. Andrew On 24 October 2014 04:08, Gavin Simpson ucfa...@gmail.com wrote: This all looks bogus to me; you've fit the data perfectly by fitting a saturated model - there are no residual degrees of freedom and (effectively) zero residual deviance. Things are clearly amiss because you have huge standard errors. You have 24 data points and fit a model with 23 coefficient plus the intercept; you just replaced your data with 24 new data points (the values in the Estimate column of the summary() output) I really wouldn't bother interpreting it any further. HTH G On 21 October 2014 18:21, Andrew Halford andrew.half...@gmail.com wrote: Hi Thierry, The multiple comparisons ran just fine but there was a ridiculous amount of interaction combinations all of which were non-significant even though there was a highly significant interaction term. I decided to remove test as a variable to simplify the analysis and run separate single explanatory variable logistic regressions. I have included a result below which is still producing an outcome I cant explain. Namely, why am I getting such a significant result for the ANOVA but when I do the tukey tests nothing is significant? sg_habitat Age Prefer Avoid 1 1 1714 2 2 2010 3 3 14 9 4 4 1312 5 5 018 6 6 0 5 model_sg - glm(cbind(Prefer,Avoid) ~ Age, data=sg_habitat, family=binomial) anova(model_sg, test=Chisq) Analysis of Deviance Table Model: binomial, link: logit Response: cbind(Prefer, Avoid) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(Chi) NULL 5 36.588 Age 5 36.588 0 0.000 7.243e-07 *** mc_sg - glht(model_sg, mcp(Age = Tukey)) summary(mc_sg) Simultaneous Tests
Re: [R-sig-eco] Regression with few observations per factor level
On 23/10/2014, at 18:17 PM, Gavin Simpson wrote: On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au wrote: A good place to start is by looking at your residuals to determine if the normality assumptions are being met, if not then some form of glm that correctly models the residuals or a non parametric method should be used. Doing that could be very tricky indeed; I defy anyone, without knowledge of how the data were generated, to detect departures from normality in such a small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean. Second, one usually considers the distribution of the response when fitting a GLM, not decide if residuals from an LM are non-Gaussian then move on. The decision to use the GLM should be motivated directly from the data and question to hand. Perhaps sometimes we can get away with fitting the LM, but that usually involves some thought, in which case one has probably already thought about the GLM as well. I agree completely with Gavin. If you have four data points and fit a two-parameter linear model and in addition select a one-parameter exponential family distribution (as implied in selecting a GLM family) you don't have many degrees of freedom left. I don't think you get such models accepted in many journals. Forget the regression and get more data. Some people suggested here that an acceptable model could be possible if your data points are not single observations but means from several observations. That is true: then you can proceed, but consult a statistician on the way to proceed. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] SIMPER problem: invalid 'nrow' value (too large or NA)
On 14/10/2014, at 21:41 PM, mastratton wrote: markusvlindh wrote Dear all, I'm having difficulty applying a SIMPER analysis found in vegan, following the example provided i the help function of simper. I keep receiving the following error message: Error in matrix(ncol = P, nrow = n.a * n.b) : invalid 'nrow' value (too large or NA) My data consist of a community matrix with 200 species and 43 dates (class = data.frame) and my groups consists of factors with in total 12 levels. A mock example could be the following that is working! : library(vegan) community-data.frame(replicate(43,sample(0:1000,200,rep=TRUE))) groups-as.factor(replicate(1,sample(c(Alpha,Beta,Gamma,Epsilon,Bact,Actino,Verr,Unclass,Cyano,Plancto,Eury,Chloro),200,rep=T)) simper_test-simper(community,groups) summary(simper_test) But please see the attached files for true data that is not working. Could someone please please assist in what is the problem with my data. Kind regards! Markus, I was getting the same error message and discovered that simper() is not written to handle an input 'group' factor that has one or more unique values with only one occurrence. Your data (reattached) have two of these instances:. ... The source code for simper() can be modified to allow these instances: Yes, the source can be modified *and* it has been modified to cope with one-member groups. You can install the modified version of vegan for Windows, or if you have programming tools for other OS's, too, using: install.packages(vegan, repos=http://R-Forge.R-project.org;) This is development version, but we are going to release new CRAN version of vegan later this month, and the R-Forge version is very close to the release version. (Looks like the build system is down again in R-Forge, and has been for a week, so that this is not the most recent and stable version of R-Forge, but it is mostly safe.) Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] [vegan]Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' when perfroming a cca on a dataframe
Dear Tim Richter, Some minimal information is the version of vegan you are using: we have just redesigned the permutation functions in github and R-Forge and it is essential to know where to look at (and we are not too keen to look at the code that will be thrown away in the next release later within a few days). Secondly, you could give us a traceback() of the error. When you get the error, just type: traceback() copy and paste the message and send it to us. This at least tells us where you get the error. Thirdly, this is a question specific to the vegan package and may not be too exciting for the general public. You may consider contacting the package maintainers directly. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org on behalf of Tim Richter-Heitmann trich...@uni-bremen.de Sent: 13 October 2014 16:49 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] [vegan]Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' when perfroming a cca on a dataframe Hi there, i have three dataframes, which have the same structure (data.frame, 358 observations of 340 variables, every column is numeric), and are basically submatrices of a parental dataset. I can perform a CCA on all of them (with the explanatory dataset being abio) /cca1_ab - cca(df1~., data=abio)/ I can also perform an anova on all three cca-derived values /anova(cca1_ab)/ However, if i do /sig1_ab-anova(cca1_ab, by=term, perm=200)/ which i believe gives me confidence values for the fitting of the explanatory variables, only one of the cca-derived values gives an unexpected Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' The same is true when i try to perform an ordistep forward selection. The other two are working fine. Its hard to come up with an explanation, as the datasets look and behave the same otherwise. Any idea why this could happen? I am sorry for not providing data for reproduction, as the data sets are pretty large. -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Stra�e (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] predict NMS scores for new samples
On 12/10/2014, at 23:13 PM, Dave Roberts wrote: Hi Jonathan, If you're using the metaMDS function in vegan with the monomds engine then it's possible. I have posted a function (monomds) at the bottom of http://ecology.msu.montana.edu/labdsv/R/labs/lab9/lab9.html that shows how to generate a labdsv:::nmds object from vegan's monomds function (courtesy of Peter Minchin). You will have to have package vegan loaded to get the monomds FORTRAN code loaded. Then you can use the function addpoints.nmds (also at the bottom of that page) to add points to an existing nmds. I'm not convinced that it's a good idea, but I worked with someone who needed it to fulfil contract obligations so I wrote it. It looks like a very good idea to me. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] adonis() and collinearity
Dear Jonas Kuppler, On 17/09/2014, at 16:06 PM, Jonas Kuppler wrote: if I use only continuous explanatory variables in the adonis() function it is like a multiple linear regression with dissimilarities. In multiple regression with highly correlated exp. variables I have the problem with multicollinearity and an increasing standard error of the coefficents. Is it the same for the adonis() function? Yes, it is. I would think so since adonis() is analogous to a MANOVA, but I am not sure. And is there any possibility to estimate the influence of multicollinearity in adonis(); like the variance inflation factor for lm? Not directly: adonis does not return all intermediate results that are useful to find the vif. It could be modified to return those items and then it would be very simple to calculate vif. Not yet, though (neither in plans). There are some tricks that may work even with the current code: check Legendre Legendre latest edition. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] simple question about CCA
Simone, If you continue your first example and *look* at the result object, you'll see: ccatest Call: cca(formula = dat[8:144] ~ x1 + x2 + x3 + x4 + x5 + x6 + x7, data = dat) Inertia Proportion Rank Total 13.76 1.00 Constrained 13.76 1.00 62 Unconstrained0.00 0.000 Inertia is mean squared contingency coefficient Some constraints were aliased because they were collinear (redundant) This shows you, among other things, that the number of constraints is higher than the number of observations: there is no residual variation (Unconstrained Inertia = 0), and Some constraints were aliased because they were collinear (redundant). You won't see those aliased constraints, and therefore the last ones are dropped. In this case, the aliased constraints are: alias(ccatest, names=TRUE) [1] x6SCL x6SL x6ZC x6ZCL x7 That is, four levels of x6 and x7. These are not shown. If you change the order of the constraints, some other variables may be among those last four ones that are not shown and cannot be analysed. You need either more data (more observations) or a more sensible model with fewer constraints. The first way (collect more data) is more heroic, but the second is more clever. If you look at your data (dat), you see that x5 is a factor with 54 levels and x6 is a factor with 10 levels. You have 63 observations, and these two together with 64 levels are able to completely explain everything and anything in these data: you run out of degrees of freedom. Sorry for top-posting and bad formatting: this MS Outlook. cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org on behalf of Simone Ruzza simone.ruzz...@gmail.com Sent: 11 September 2014 13:24 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] simple question about CCA Dear all, apologies for the simplicity of my question, maybe it has been asked many times, but I am a total novice to CCA. I have performed a CCA using a series of environmental variables that comprise a mixture of categorical and non-categorical variables. What I do not understand is why when I change the order of my variables and I plot the results, a variable disappears from the CCA biplot i.e. the last one being continuous variable. I realised that there might a very simple question, so I would be happy even with a reference where to find an answer. Below some code showing what is happening. thanks in advance, Simone require(RCurl) require(vegan) x - getURL(https://dl.dropboxusercontent.com/u/33966347/testdata.csv;) dat- read.csv(text = x) # example 1 x7 disappear from the plot (note that x5 and x6 are categorical) ccatest-cca(dat[8:144]~x1+x2+x3+x4+x5+x6+x7,data=dat) plot(ccatest) # example 2 x7 is present in the plot ccatest1-cca(dat[8:144]~x7+x1+x2+x3+x4+x5+x6,data=dat) plot(ccatest1) ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Manual Rarefaction and CI's
Nicholas, Your approach looks perfectly OK. However, rarefied sampling is normally done *without* replacement. It looks to me that a similar analysis could be done with vegan::specaccum function. However, that uses sampling without replacement. About your specific questions: you can use quantile() function in base R to get desired confidence intervals. The CIs should be given as argument probs to quantile(). For quantile function you should return a vector of richness values instead of a table. You can get similar with table output as well, but you may need to write that yourself. It is excepted that the CI gets narrower as the subsample size increases. In classic rarefaction sampling without replacement, there is only one subsample of the original sample size and therefore there is no variance but you get the observed data. As the subsampled proportion approaches 1, the number of possible samples decreases and so does the variance of subsamples. With replacement this may not happen as some units are there twice or more and some are missing. Subsample of original size with replacement should contain 63.2% of your original units. This also means that because you duplicate (or multiplicate) some sampling units and omit some, you will systematically underestimate richness when you subsample with replacement. This is why we normally use subsampling without replacement. However, the larger subsamples tend to become more similar, in particular with data set like you described. This is necessary and correct. This is the most annoying thing in CIs in rarefaction to me. The rarefaction variation *only* describes the randomness of subsampling from a fixed sample. It does not describe the variance of that fixed sample, or the real variance of your observed richness. With real variability I mean the variance you observe if you replicate your sampling in Nature and get completely new, independent samples from the same real community. I am afraid people use those rarefaction CIs and variances as measures of sample variability which is really misleading -- they actually describe only the artifactual variance of subsampling from fixed samples, when these fixed samples are regarded as error-free and to have zero variance of species richness. Using these variances to infer differences in species richness in Nature is wrong and misleading. I have tried to spell out that warning in vegan manual pages, but people seem to ignore those parts of the text. cheers, Jari Oksanen On 28/05/2014, at 20:05 PM, Nicholas J. Negovetich wrote: I have a question related to rarefaction of our samples. Unlike all of the examples of which I'm aware, I'm not working with 'sites', per se. Instead, I'm working with the parasites of snails. Snails are infected with 1 parasite species, and each pond/stream can hold several parasite species. The difficulty comes when comparing parasite richness between sites (Pond A vs Pond B) and sampling effort (# snails collected) differs. Rarefaction provides a means to standardize effort on the lowest sample size so that we can test if parasite richness differs between sites. I'm familiar with the vegan package and how it performs rarefaction. But, I can't perform this type of analysis because (1.) 'uninfected' snails would be considered empty patches, and (2.) max richness will be 1 (only one parasite species can (usually) infect a snail). To counter this problem, I've tried to rarefy my samples using randomization methods. The script is below. In summary, I sampled (with replacement) my dataset and calculated the number of parasite species from each sample. I repeated this 100 times and calculated the mean richness for a given sample size. I did this for sample sizes 1-50 (smallest sample size is 50 individuals). I have a few questions related this this analysis. 1. Does this appear correct? 2. How can I generate CI using this method? I normally calculate the 2.5% and 97.5% quantiles when performing randomization, but the nature of our data does not lend itself well to quantile calculations. Could I assume that my bootstrapped means follow a normal distribution? If not, then what would be the best method for generating confidence intervals? 3. This last one is the primary reason for this post. When performing this analysis on my real datasets, I've noticed a peculiar thing. In one sample, my variance of the bootstrapped samples converges to zero as sample size approaches the true sample size (this is for my sample of the smallest sample size). Relatedly, I've noticed some (but not all) of my CI narrowing as my sample size approaches the actual size. This suggests that I'm not doing something correctly, but I'm not sure how to correct it (or if I even need to correct it). The alternative for me is to scrap the rarefaction and perform an alternative analysis, such as randomizing
Re: [R-sig-eco] how are compuetd the species scores of pca from veganpackage ?
Dear Claire Della Vedova, It seems that you have searched in many places, except in vegan documentation. Look at the vignette on Design decisions, section Scaling in redundancy analysis. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org r-sig-ecology-boun...@r-project.org on behalf of claire della vedova claire.della-ved...@magelisgroup.com Sent: 27 May 2014 11:17 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] how are compuetd the species scores of pca from veganpackage ? Hi everybody, I'm working on PCA approach, and comparing outputs from ade4 and vegan packages. I'm ok with the normalization of the variables coordinates coming from ade4 outputs. (with $co : coordinates are scaled to eigen values ; with $c1 : coordinates are scaled to 1). But I have difficulties to understand how are computed the Species scores in vegan's outputs with scaling 1 or 2 options, and what means the message concerning scaling, especially about the 'General scaling constant of scores'. For example : 'Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' ' I've search on the archives of R-sig-ecology , cross validate and stackoverflow, and found nothing that helped me. If somebody has some information about it, I would greatly appreciate some help. All the best. Claire Della Vedova Here some parts of my code : library(ade4) library(vegan) doubs.env - read.csv ('http://www.sci.muni.cz/botany/zeleny/wiki/anadat-r/data -download/DoubsEnv.csv', row.names = 1) ## with ade4 ## pca.ad-dudi.pca(doubs.env, scale = TRUE, center = TRUE, scann = FALSE,nf=3) # eigen value of the fisrt eigen vector pca.ad$eig[1] [1] 5.968749 #variables coordinates in first eigen vector pca.ad$co[,1] [1] 0.85280863 -0.81918008 -0.4528 0.75214647 -0.04996375 0.70722171 [7] 0.83048310 0.90260821 0.79011263 -0.76485397 0.76373149 #check the normalization of laodings sum(pca.ad$co[,1]^2) [1] 5.968749 #= coordinatesscaled to eigen values #variables normed scores in first eigen vector pca.ad$c1[,1] [1] 0.34906791 -0.33530322 -0.18535177 0.30786532 -0.02045094 0.28947691 [7] 0.33992972 0.36945166 0.32340546 -0.31306670 0.31260725 #check the normalization sum(pca.ad$c1[,1]^2) [1] 1 #= coordinates scaled to 1 ## with vegan ## pca.veg-rda(doubs.env, scale = TRUE) # species scores for the fisrt eigen vector, with sacling 1 summary(pca.veg, scaling=1) dasaltpendeb pHdurpho 1.4752228 -1.4170507 -0.7833294 1.3010933 -0.0864293 1.2233806 1.4366031 nitammoxydbo 1.5613681 1.3667687 -1.3230752 1.3211335 'Scaling 1 for species and site scores * Sites are scaled proportional to eigenvalues * Species are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' summary(pca.veg, scaling=2)[1][[1]][,1] das alt pen deb pH dur 1.08668311 -1.04383225 -0.57701847 0.95841533 -0.06366582 0.90117039 pho nit amm oxy dbo 1.05823501 1.15013974 1.00679334 -0.97460773 0.97317743 'Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 4.226177 ' -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/how-are-compuetd-the-species-scores-of-pca-from-vegan-package-tp7578918.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Measurement distance for proportion data
Typical dissimilarity indices are of form difference/adjustment, where the adjustment takes care of forcing the index to the range 0..1, and handles varying total abundances / richnesses. If you have proportional data, you may not need the adjustment at all, but you can just use any index. That is, it does not matter so awfully much what index you use, and for many practical purposes it does not matter if data are proportional. Actually, several indices may be equal to each with with proportional data. For instance, Manhattan, Bray-Curtis and Kulczynski indices are all identical. All you need to decide is which name you use for your index -- numbers do not change. The analysis of proportional data usually covers very different classes of models than ANOSIM and friends. Dissimilarities are not usually involved in these models. One aspect in proportional data is that only M-1 of M variables really are independent. However, this really needs to be taken into account if M is low. I have no idea how is that in your case. Cheers, Jari Oksanen On 13/05/2014, at 15:32 PM, Zbigniew Ziembik wrote: I am not sure, but it seems that your problem is related to compositional data analysis. You can probably use Aitchison distance to estimate separation between proportions. Take a (free) look at: http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf. http://dugi-doc.udg.edu/bitstream/10256/297/1/CoDa-book.pdf. or (commercial): Aitchison, J. 2003. The Statistical Analysis of Compositional Data. The Blackburn Press. Best regards, ZZ Dnia 2014-05-12, pon o godzinie 16:37 +, Javier Lenzi pisze: Dear all, I'm doing data exploration on seabirds trophic ecology data and I am using ANOSIM to evaluate possible differences in diet during breeding and non-breeding seasons. As starting point I am using some classical indexes such as %FO (relative frequency of occurrence), N (number of prey counted in the pooled sample of pellets), %N (N as a percentage of the total number of prey of all food types in the pooled sample), V (total volume of all prey in the pooled sample), and IRI (index of relative importance). I have a concern on which similarity meassurement should I use in ANOSIM for those indexes that are proportions.. I am aware that for instance Bray-Curtis is used for count data (e.g. N) and Jaccard is used for presence-absence data (which I don't have), however I did not find a proper distance measurement for proportion data. Please, could you help me to find a proper distance measurement for these proportion data? Thank you very much in advance. Regards,Javier Lenzi [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Measurement distance for proportion data
Typical dissimilarity indices are of form difference/adjustment, where the adjustment takes care of forcing the index to the range 0..1, and handles varying total abundances / richnesses. If you have proportional data, you may not need the adjustment at all, but you can just use any index. That is, it does not matter so awfully much what index you use, and for many practical purposes it does not matter if data are proportional. Actually, several indices may be equal to each with with proportional data. For instance, Manhattan, Bray-Curtis and Kulczynski indices are all identical. All you need to decide is which name you use for your index -- numbers do not change. The analysis of proportional data usually covers very different classes of models than ANOSIM and friends. Dissimilarities are not usually involved in these models. One aspect in proportional data is that only M-1 of M variables really are independent. However, this really needs to be taken into account if M is low. I have no idea how is that in your case. Cheers, Jari Oksanen On 13/05/2014, at 15:32 PM, Zbigniew Ziembik wrote: I am not sure, but it seems that your problem is related to compositional data analysis. You can probably use Aitchison distance to estimate separation between proportions. Take a (free) look at: http://www.leg.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:a_concise_guide_to_compositional_data_analysis.pdf. http://dugi-doc.udg.edu/bitstream/10256/297/1/CoDa-book.pdf. or (commercial): Aitchison, J. 2003. The Statistical Analysis of Compositional Data. The Blackburn Press. Best regards, ZZ Dnia 2014-05-12, pon o godzinie 16:37 +, Javier Lenzi pisze: Dear all, I'm doing data exploration on seabirds trophic ecology data and I am using ANOSIM to evaluate possible differences in diet during breeding and non-breeding seasons. As starting point I am using some classical indexes such as %FO (relative frequency of occurrence), N (number of prey counted in the pooled sample of pellets), %N (N as a percentage of the total number of prey of all food types in the pooled sample), V (total volume of all prey in the pooled sample), and IRI (index of relative importance). I have a concern on which similarity meassurement should I use in ANOSIM for those indexes that are proportions.. I am aware that for instance Bray-Curtis is used for count data (e.g. N) and Jaccard is used for presence-absence data (which I don't have), however I did not find a proper distance measurement for proportion data. Please, could you help me to find a proper distance measurement for these proportion data? Thank you very much in advance. Regards,Javier Lenzi [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] calculating standard error of coefficients from adonis model
Rafter, I am not Gavin, but I step in and speak for him. Gavin quite clearly suggested that you bootstrap adonis. The boot package is intended to generate bootstrap samples and to estimate the bootstrap confidence limits for *any* function. The boot package does not contain those functions, but you can use it to analyse adonis. At the moment, I have no idea how to directly calculate the SEs you asked for, and even if I had, I am sure that bootstrapping would give more reliable estimates. We need to make very strong and very wrong assumptions if we try to estimate those SEs directly within adonis code, and bootstrapping is certainly better. What is sure is that the equations you gave in your message won't work. I have never used these adonis coefficients, and I have no idea how to use them, and therefore I don't have any opinion about their use. Cheers, Jari Oksanen On 09/05/2014, at 03:02 AM, Rafter wrote: Hi Gavin, Thanks so much for your reply. By that do you mean move away from adonis, and bootstrap a more conventional model that calculates parameter standard errors? Adonis has been the choice for me because of its ability to handle multivariate, distribution-independent data in x and y. If you're suggesting using another model, do you have any suggestions about type? If you're not suggesting moving away from adonis, would perhaps slightly unpack your suggestion? Thanks very much for your time and consideration. Warmly, Rafter -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/calculating-standard-error-of-coefficients-from-adonis-model-tp7578878p7578882.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] residuals in RDA, and test for spatial autocorrelation
Dear Tracy, You can see if Helene Wagner's mso() in function in vegan satisfies your needs for analysing spatial dependence. Reference and further description in ?mso. Cheers, Jari Oksanen On 02/04/2014, at 00:12 AM, Pinney, Tracy A wrote: Hello List, I have two questions. 1.)How do I generate a matrix of residuals in RDA (I am using rda() in vegan)? 2.)How can I test for spatial dependence (spatial autocorrelation) in the response matrix of a multivariate analysis (RDA in my case)? I have read that you can look at the trace of the variogram matrix as a multivariate alternative to Moran's I, but I don't know the details of how to make this test happen using R. Thanks for any help! Tracy [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] cross validation in CoCA and CCA
Jesse, I do not know what you mean with CV in this context, but basic cross validation can be done with vegan functions cca(), rda() and capscale(). These functions have predict method that accepts 'newdata', and using new data allows cross validation. They also have a calibrate() function that can directly estimate the values of predictor values from community data, and this also has 'newdata'. So you can build (train) a model and then use independent 'newdata' to use (test) the model. However, we do not have any generic crossvalidate(object, data, k, …) function for canned cross validation process, but you have to do this by hand. Neither do we have any functions for multistep CV or structured CV where some of the external variables were known and others predicted/calibrated. However, basic facilities for hand crafting such models are provided. Simple things, like k-fold cross validation are really trivial to build with ordination (but if you build in the uncertainty of model building in the process --- like you should --- you must be very careful in collecting the data as the variables can change in cross validation). Here one 5-fold CV cycle with rda: library(vegan) data(mite, mite.env) ## 5-fold CV k - rep(1:5, len=nrow(mite)) ## x is matrix to collect predictions for two vars x - matrix(NA, nrow=nrow(mite), ncol=2) colnames(x) - c(SubsDens, WatrCont) ## shuffle for each CV k - sample(k) ## the next line could be broken into several commands within {} for(i in 1:5) x[k==i,] - calibrate(rda(decostand(mite, hell) ~ SubsDens+WatrCont, mite.env, subset = k != i), newdata = decostand(mite[k==i,], hell)) Easy, but not very good a prediction (cca would be marginally better, like it usually is). Cheers, Jari Oksanen On 29/03/2014, at 04:44 AM, Gavin Simpson wrote: In short, no. I haven't ported the rough code for LOO CV of CCA or CCA-PLS models. I think I ported the mean centring and crossval functions from the Matlab sources, but not the code in the `example_crossvalCCA.m` file from the supplementary materials on the CoCA paper in Ecology. I could take a look and see how easy i will be to add this, but it doesn't sit well with cocorresp or vegan as the former was designed really for CoCA and the latter doesn't have the other functionality needed (which exists in cocorresp) and we've not really implemented CV for ordination methods. That said, this is R and it is relatively trivial to write your own LOO or k-fold CV loop, and you can predict from a CCA model using the `predict()` method for cca objects available in vegan. Part of the reason, at least as far as I see things, for not having CV in the common ordination software (closed or open source) is that these methods tend not to be seen as purely predictive models, which is what CV is designed to evaluate. Don't hold your breath for me getting this in cocorresp, but if you want to follow up I might be persuaded to take a look and see if what is already in cocorresp will enable you to follow the code in the `example_crosvalCCA.m` file to write your own LOO code. HTH G On 28 March 2014 14:57, Jesse Becker jcbecke...@gmail.com wrote: Hello list, I am doing a concordance study between riverine environmental conditions, invertebrate, and fish assemblages. I am doing a predictive CoCA as part of the analysis with the cocorresp package. My question is whether there is an implementation of the cross-validation procedure in the cocorresp package that would work on the results of a CCA or RDA, without having to use MATLAB (which I don't have access to)? My understanding is that by doing the cross validation on the CCA (and hopefully RDA, although I've never seen it done) it allows for a more consistent evaluation of differences between the two methods. I haven't seen this as a function in vegan. Jari? Gavin? Thanks, Jesse Jesse C. Becker, Ph.D. 765.285.8889765.285.8889 office 512.587.4428512.587.4428 cell jcbec...@bsu.edu jcbecke...@gmail.com Call Send SMS Add to Skype You'll need Skype CreditFree via SkypeI am [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Gavin Simpson, PhD ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER
Patches welcome. It is best to use vegan in GitHub. Pairwise tests are not high on my TODO list, because they are so much against what I've learnt from statistical theory and I detest tests. Cheers, Jari Oksanen Sent from my iPad On 27.3.2014, at 17.54, Gavin Simpson ucfa...@gmail.com wrote: No, that will just consider the dispersions *about the centroids* not location shifts of the centroids. The latter is what `adonis()` does, but we don't have pairwise comparisons (with/without permutation test) there or the Tukey post-hoc tests. I suppose we *could* automate the process that Steve suggests, just as I automated it for `betadisper()`, and I think this has been raised before, but it hasn't risen to the top of anyone's TODO list yet to actually see it implemented. Patches welcome :-) ! G On 27 March 2014 06:55, Johannes Björk bjork.johan...@gmail.com wrote: Hi, For that I believe you can run TukeyHSD.betadisper... to getting significant values between levels. see ?TukeyHSD.betadisper Cheers, On Mar 27, 2014, at 1:47 PM, Brandon Gerig wrote: Hi Steve, Yes, this is precisely what I am interested in doing. It seems like betadisper might be a good way to visualize differences/similarities in the dispersion and examine differences among centroids for the levels within a factor. Am I correct in thinking that if I conduct additional PERMANOVA tests on a reduced data set, I could be evaluating differences between the levels of a main effect? Could anyone provide a citation for a paper that uses a similar procedure? On Wed, Mar 26, 2014 at 3:21 PM, Steve Brewer jbre...@olemiss.edu wrote: Brandon, Are you asking if you can use betadisper as a substitute for post-anova pairwise comparisons among levels? After using betadisper to obtain dispersions, I believe you can plot the centroids for each level. In addition to telling you if the dispersions differ among levels, you could see how the centroids differ from one another. Is this what you want to know? If so, realize that it won't give you pairwise significance tests for differences between levels. For that, you might want to do additional permanovas on reduced datasets containing only the two levels you want to compare. You could then adjust the p-values for multiple tests after the fact. Hope this helps, Steve J. Stephen Brewer Professor Department of Biology PO Box 1848 University of Mississippi University, Mississippi 38677-1848 Brewer web page - http://home.olemiss.edu/~jbrewer/ FAX - 662-915-5144 Phone - 662-915-1077 On 3/26/14 10:57 AM, Brandon Gerig bge...@nd.edu wrote: Thanks for the words of caution on simper. Am I completely off base in thinking that betadiver function (analgous to Levene's test) could be used to examine variation between levels within main effects? Cheers On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote: I am assessing the level of similarity between PCB congener profiles in spawning salmon and resident stream in stream reaches with and without salmon to determine if salmon are a significant vector for PCBs in tributary foodwebs of the Great Lakes. My data set is arranged in a matrix where the columns represent the congener of interest and the rows represent either a salmon (migratory) or resident fish (non migratory) from different sites. You can think of this in a manner analogous to columns representing species composition and rows representing site. Currently, I am using the function Adonis to test for dissimilarity between fish species, stream reaches (with and without salmon) and lake basin (Superior, Huron, Michigan). The model statement is: m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutation s=999) The output indicates significant main effects of FISH, REACH, and BASIN and significant interactions between FISH and BASIN, and BASIN and REACH. Is it best to then interpret this output via an NMDS ordination plot or use something like the betadiver function to examine variances between main effect levels or both? Also, can anyone recommend a procedure to identify the congeners that contribute most to the dissimilarity between fish, reaches, and basins?. I was thinking the SIMPER procedure but am not yet sold. Any advice is appreciated! -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame [[alternative HTML version deleted]] ___ R-sig-ecology mailing
Re: [R-sig-eco] report out by t.test
Except that t-test does not assume that *observations* are normally distributed, nor that variances are equal. Avoid non-parametric tests: they assume too much of data properties. For var.equal assumption in t.test, see ?t.test. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Richard Boyce [boy...@nku.edu] Sent: 24 March 2014 13:23 To: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] report out by t.test Mike, There is no way that your data meet the assumptions of a t-test (normal distributions, equal variance). A nonparametric Mann-Whitney (aka Wilcoxon) test is much better suited to your data. Here's what I got when I ran it: Q-c(13,0,10,2,0,0,1,0,0,1,5) WD-c(0,0,1,0,0,0,0,0,0,0,1) wilcox.test(Q,WD) Wilcoxon rank sum test with continuity correction data: Q and WD W = 86.5, p-value = 0.05119 alternative hypothesis: true location shift is not equal to 0 Warning message: In wilcox.test.default(Q, WD) : cannot compute exact p-value with ties This has a p-value quite close to 0.05, giving some evidence that there's a difference between your groups. Note that this you have different null and alternative hypothesis: groups are the same vs. groups are different. Rick Boyce On Mar 24, 2014, at 7:00 AM, r-sig-ecology-requ...@r-project.orgmailto:r-sig-ecology-requ...@r-project.org wrote: Message: 1 Date: Sun, 23 Mar 2014 14:21:41 -0700 From: Michael Marsh sw...@blarg.netmailto:sw...@blarg.net To: r-sig-ecology@r-project.orgmailto:r-sig-ecology@r-project.org Subject: [R-sig-eco] report out by t.test Message-ID: 532f5065.7030...@blarg.netmailto:532f5065.7030...@blarg.net Content-Type: text/plain; charset=ISO-8859-1; format=flowed I test differences between frequency of hits of exotic annual forbs in plots on two sites, Q and WD. Q-c(13,0,10,2,0,0,1,0,0,1,5) WD-c(0,0,1,0,0,0,0,0,0,0,1) t.test(Q,WD) Welch Two Sample t-test data: Q and WD t = 1.9807, df = 10.158, p-value = 0.07533 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3342006 5.7887460 sample estimates: mean of x mean of y 2.9090909 0.1818182 The p-value is greater than 0.05, thus does not reach the 95% confidence level, yet the difference in means is reported as not equal to 0. Am I encountering a one-sided versus two sided comparison that I don't understand, or is ther another explanation? Mike Marsh Richard L. Boyce, Ph.D. Director, Environmental Science Program Professor Department of Biological Sciences, SC 150 Northern Kentucky University Nunn Drive Highland Heights, KY 41099 USA 859-572-1407 (tel.) 859-572-5639 (fax) boy...@nku.edumailto:boy...@nku.edu http://www.nku.edu/~boycer/ = One of the advantages of being disorderly is that one is constantly making exciting discoveries. - A.A. Milne [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Extract residuals from adonis function in vegan package
Dear Alicia Valdés, On 18/03/2014, at 13:53 PM, Alicia Valdés wrote: My problem is that I cannot figure out how to get residual values from the adonis model. You cannot get residuals from the output of adonis(). We could change the function so that this is possible, but the current function does not return information for getting residuals. Neither would they be residuals in the traditional meaning of the word as we are dealing with dissimilarities or distances, and these cannot be negative. We got to discuss this with vegan developers. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Extract residuals from adonis function in vegan package
On 18/03/2014, at 15:23 PM, Eduard Szöcs wrote: Dear Alicia and Jari, just a thought: Couldn't be capscale or betadisper be used for this? - To obtain the distances to the group centroid? But than: How to convert this from distances to abundances? You can *almost* do this with capscale(), but not quite: for semimetric dissimilarities the results are not identical with capscale (they are identical with metric distances). The capscale() function also has fitted() and residuals() methods that both return dissimilarities. Now it also depends on what you mean with residuals. The capscale() interpretation and the one I had on my mind is that 1) adonis(fitted(adonis(y ~ model)) ~ model) should give distances where the fitted part of adonis(y ~model) and the residual variation part should be null, and 2) adonis(residuals(adonis(y ~model) ~ model) should give distances where fit would be null and residual similar as in the original adonis(y ~ model). It would be possible to develop such functions, but not with the current adonis() output. You can approximate both of these with capscale() and its fitted() and residuals() methods, but not exactly. The ecodist package of Sarah Goslee takes a different approach, and could return something usable (but I do not know that package very well). What really is needed depends on what you mean with residuals. Should they be dissimilarities (which cannot be negative) or straightforward residuals (which have an average of zero and some of which are negative). Cheers, jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] How to accommodate data with negative values for Canonical Correspondence Analysis in R using vegan Package
It depends *where* to include negative values. Negative values are OK as constraints (environmental variables, right hand side of the model formula). However, all marginal sums of the response data (left hand side of the model formula) must be above zero. It is technically possible to have some negative entries in the response matrix as long as the marginal sums are positive, but you really should not have them. *CA family of methods were originally developed for non-negative data, and having negative entries usually indicates that your data are not at all suitable for the method. If you insist on the analysis, then you really must know what you are doing, but if you really know, you do not need ask in R-sig-ecology. I still repeat: you can have negative data as constraints. If that fails, then something else is wrong with your data. If you want to use CCA family of methods for negative response data, then RDA with equal scaling of variables is usable. Very commonly negative values also indicate that your response variables were measured in different scales and units, and therefore you must set scale=TRUE in the rda() call. Not knowing the data or any other details, this is a blind watchmaker recommendation. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Ivailo [ubuntero.9...@gmail.com] Sent: 26 February 2014 16:23 To: Rajendra Mohan panda Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] How to accommodate data with negative values for Canonical Correspondence Analysis in R using vegan Package On Wed, Feb 26, 2014 at 2:43 PM, Rajendra Mohan panda rmp.iit@gmail.com wrote: ... I have temperature data with negative values which I am not able to include for my CCA ordination. ... Rajendra, I am curious -- why are you not able to include the negative values in the CCA ordination? -- The cure for boredom is curiosity. There is no cure for curiosity. -- Dorothy Parker ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] cca
On 20/12/2013, at 10:15 AM, Mahnaz Rabbaniha wrote: Dear all I have done cca in vegan package, bio file are the zooplankton genus m file is the hydro biological groups vare.cca - cca(bio,m) plot(vare.cca) the main question is about this plot, in the plot only we see the hydrobiological variables, is there any way or code in plot that also show the names of zooplankton groups? Dear Mahnaz Rabbaniha, Have you tried plot(vare.cca, type=t)? Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Ranked abundance distribution
Hello, On 17/12/2013, at 17:01 PM, Sol Noetinger wrote: Hello, Cluster II RAD models, family poisson No. of species 35, total abundance 100 par1 par2par3 Deviance AIC BIC Null 25.7004 Inf Inf Preemption0.1 27.8760 Inf Inf Lognormal 0.21756 1.3473 4.7797 Inf Inf Zipf0.27724 -1.0959 4.9038 Inf Inf Mandelbrot 0.64175 -1.3825 1 4.9181 Inf Inf I read from the manual that to see which models fits better you use the AIC values. What is the meaning of getting infinite? It seems you can get infinite AIC and BIC when the distribution you selected is in conflict with the nature of your data. For instance, when you postulate a Poisson model (like in your case), but your data are not integers (counts). Was that the case with you? Distribution families that go along with non-integer (real) data are gaussian and Gamma. You can neither use quasipoisson nor other quasi models, because these do not have AIC. Can I use the Deviance value to compare the models? And in case I can use the deviance, since there are very close values, should I run a test to see if the differences are significant?� in that case, which one?. The models have different numbers of estimated parameters and they are not nested. Many people claim that you can use neither deviance nor AIC (and they are right). At least you must take into account the number of estimated parameters that varies from zero to three. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mean distance values (and errors) between groups of samples using vegan
Function meandist() should do this. Its name is kind of hint. Cheers, Jari Oksanen On 16/12/2013, at 21:16 PM, Andres Mellado Diaz wrote: Dear list memnbers, I would greatly appreciate any suggestion on how to get averaged distance values (and errors) between groups of samples using vegan (vegdist, dist,...). As an example, I have a faunal table (samples by taxa, mMCINVmmh) and some grouping factors (for example: gradreg_mmh) that I want to test, like this: vegdist(mMCINVmmh, method=bray, binary=FALSE, diag=F, upper=F)-vegdist1 vegdist1 TR_1_2TR_1_3TR_2_2TR_2_3TR_3_1TR_3_2 TR_1_3 0.7361704 TR_2_2 0.1509007 0.7785157 TR_2_3 0.6465259 0.4095644 0.6832353 TR_3_1 0.8717468 0.9572968 0.8551546 0.9373980 TR_3_2 0.6925163 0.7197110 0.7428020 0.4742819 0.7800620 TR_3_3 0.8166542 0.7199418 0.8365397 0.6215122 0.8582932 0.5364242 gradreg_mmh [1] 0 0 3 3 5 5 5 Levels: 0 3 5 So I want to get the pairwise mean distances and errors between groups 0, 3 and 5... many thanks in advance, Andrés -- Andrés Mellado Díaz Centre for Hydrological Studies CEH-CEDEX Water Quality Department Pº bajo de la Virgen del Puerto, 3 28005, Madrid SPAIN -- Andrés Mellado Díaz Centre for Hydrological Studies CEH-CEDEX Water Quality Department Pº bajo de la Virgen del Puerto, 3 28005, Madrid SPAIN ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NA error in envfit
On 05/12/2013, at 18:42 PM, Dixon, Philip M [STAT] wrote: I wonder if the problem is a factor level with no observations. One of the frustrating things about factors (class variables) in R is that the list of levels is stored separately from the data. This can cause all sorts of problems if you create the factor, then subset the data, and the subset is missing one or more levels of the factor. You are subsetting your data, so this may be the source of the problem. My working philosophy is to keep variables as character strings or numbers until just before I need the factors. That avoids any issues with extraneous levels. That means reading data sets (.txt or .csv files) with as.is=TRUE to avoid default creation of factors. relevel() may recreate the list of levels. I usually use factor(as.character(variable)) to flip a factor to a vector of character strings then back to a factor with the correct set of levels. Philip, It very much look that this kind of approach is the source of all evil. We *assume* in envfit() that if a variable is not a factor, then it is numeric. If it is a character string instead being numeric, you get those strange error messages. We do take care of the extraneous factor levels in envfit, but we expected that variables are either factors or numeric -- we did not expect character strings. I guess we have to add some ugly code to handle these cases and either cast character strings to factors or ignore variables that are neither numeric nor factors. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NA error in envfit
Kendra, Are you sure that it was a factor? I am unable to trigger an error with a one-level factor in vegan 2.0-9. Moreover, the error message you sent was from vectorfit and factors (also one-level factors) are not handled in that function but they go to factorfit, and error should come from factorfit. Character strings variable go to vectorfit (instead of factorfit) and gives exactly that error and from vectorfit. I don't ask these things out of my meanness, but I want to fix these functions for the next release. I have now found one problem and I have fixed that in R-Forge. If there are some other problems, I want to fix them, too. Therefore I really want to know what happened with your application. I try to reproduce your problems, but this is kind of blind watchmaker's works as I don't have a reproducible test case. Therefore I have to ask stubbornly. Cheers, Jari Oksanen On 06/12/2013, at 21:35 PM, Mitchell, Kendra wrote: My offending variable was correctly imported as a factor, but since I was subsetting the data to look only at one zone at a time it was a factor with only one level -- Kendra Maas Mitchell, Ph.D. Post Doctoral Research Fellow University of British Columbia 604-822-5646 From: Gavin Simpson [ucfa...@gmail.com] Sent: Friday, December 06, 2013 11:09 AM To: Dixon, Philip M [STAT] Cc: Mitchell, Kendra; r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] NA error in envfit Phillip, You approach to using factors misses an important consideration; the class that was observed in the full dataset should not disappear just because you subsetted the data in some manner. Also, `droplevels()` is a useful function to call on a factor or data frame if subsetting produces levels with zero observations and if that information is not made use on in whatever computations follow next. G On 5 December 2013 10:42, Dixon, Philip M [STAT] pdi...@iastate.edu wrote: Kendra, I wonder if the problem is a factor level with no observations. One of the frustrating things about factors (class variables) in R is that the list of levels is stored separately from the data. This can cause all sorts of problems if you create the factor, then subset the data, and the subset is missing one or more levels of the factor. You are subsetting your data, so this may be the source of the problem. My working philosophy is to keep variables as character strings or numbers until just before I need the factors. That avoids any issues with extraneous levels. That means reading data sets (.txt or .csv files) with as.is=TRUE to avoid default creation of factors. relevel() may recreate the list of levels. I usually use factor(as.character(variable)) to flip a factor to a vector of character strings then back to a factor with the correct set of levels. Best wishes, Philip Dixon [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Gavin Simpson, PhD ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NA error in envfit
Hello, I think I saw something like this in autumn (northern hemisphere) when a variable had constant values with no variation, and envfit did not know how to scale the arrow. We fixed this in the development version of vegan in R-Forge on September 29. Unfortunately R-Forge is again dysfunctional and cannot build the package, but if you are able to do that yourself you can try to see if the problem is fixed there. The same files are also in github, but you need to build the package yourself there too. I'm working with a minor release of vegan (2.0-10) which may be published on Monday 9 Dec, but there are no guarantees that it will have this envfit fix or be released like planned (you know, the best laid plans of mice and men...) It may be easiest to see if a constant variable is the culprit, and remove that if needed. If this is not the case, we need more info and a *reproducible* example. We haven't got any now, and I cannot reproduce your problem. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Stephen Sefick [sas0...@auburn.edu] Sent: 04 December 2013 22:01 To: Mitchell, Kendra Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] NA error in envfit Kendra, Something is wrong in X or P; find out what the foreign function call is and then you may be able to track down the offending data problem. Maybe a logarithm somewhere? This is probably not much help; I don't have much experience with envfit. Stephen On 12/03/2013 07:06 PM, Mitchell, Kendra wrote: I'm running a bunch of NMS with vectors fitted (slicing and dicing a large dataset in different ways). I'm suddenly getting an error from envfit f.bSBS.org.fit-envfit(f.bSBS.org.nms, f.bSBS.org.env, permutations=999, na.rm=TRUE) Error in vectorfit(X, P, permutations, strata, choices, w = w, ...) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In vectorfit(X, P, permutations, strata, choices, w = w, ...) : NAs introduced by coercion I can plot the NMS and even run ordifit on individual env variables, so can't figure out what the problem is. There aren't any NA/NaN/Inf in either of those data that I can find. I've tried running it without na.rm=TRUE and still get the error. Guidance on how to fix this would be appreciated. Here's the whole slicing process and str for the data f.bSBS.org-f.env$zone.hor==bSBS.1 f.bSBS.org.tyc-f.tyc[f.bSBS.org,f.bSBS.org] f.bSBS.org.env-subset(f.env, f.env$zone.hor==bSBS.1) f.bSBS.org.nms-metaMDS(as.dist(f.bSBS.org.tyc), k=3, trymin=50, trymax=250, wascores=FALSE) f.bSBS.org.fit-envfit(f.bSBS.org.nms, f.bSBS.org.env, permutations=999, na.rm=TRUE) str(f.bSBS.org.env) 'data.frame':63 obs. of 14 variables: $ zone : Factor w/ 6 levels bIDF,bSBS,..: 2 2 2 2 2 2 2 2 2 2 ... $ site : Factor w/ 18 levels A7,A8,A9,..: 12 12 12 12 12 12 12 12 12 12 ... $ om : Factor w/ 4 levels 0,1,2,3: 2 2 2 3 3 3 2 2 2 3 ... $ compaction : num 1 1 1 1 1 1 1 1 1 1 ... $ herbicide: num 0 0 0 0 0 0 0 0 0 0 ... $ horizon : Factor w/ 2 levels 1,2: 1 1 1 1 1 1 1 1 1 1 ... $ Water_content: num 50.3 50.3 50.3 50.1 50.1 ... $ DNA_ug_g : num 71.2 71.2 71.2 68.6 68.6 ... $ C: num 30.5 30.5 30.5 28.4 28.4 ... $ N: num 0.863 0.863 0.863 0.81 0.81 ... $ pH_H2O : num 4.63 4.63 4.63 4.49 4.49 ... $ CN : num 35.3 35.3 35.3 35.1 35.1 ... $ f.env$zone : Factor w/ 6 levels bIDF,bSBS,..: 2 2 2 2 2 2 2 2 2 2 ... $ zone.hor : chr bSBS.1 bSBS.1 bSBS.1 bSBS.1 ... str(f.bSBS.org.nms) List of 35 $ nobj : int 63 $ nfix : int 0 $ ndim : num 3 $ ndis : int 1953 $ ngrp : int 1 $ diss : num [1:1953] 0.00424 0.00437 0.05169 0.07522 0.11039 ... $ iidx : int [1:1953] 12 8 55 56 52 7 56 12 59 52 ... $ jidx : int [1:1953] 7 6 18 55 8 3 18 3 12 49 ... $ xinit : num [1:189] 0.654 0.837 0.438 0.105 -0.313 ... $ istart: int 1 $ isform: int 1 $ ities : int 1 $ iregn : int 1 $ iscal : int 1 $ maxits: int 200 $ sratmx: num 1 $ strmin: num 1e-04 $ sfgrmn: num 1e-07 $ dist : num [1:1953] 0.0679 0.0231 0.3598 0.1248 0.1422 ... $ dhat : num [1:1953] 0.0455 0.0455 0.2076 0.2076 0.2076 ... $ points: num [1:63, 1:3] -0.1256 0.1224 0.267 0.2374 -0.0427 ... ..- attr(*, dimnames)=List of 2 .. ..$ : chr [1:63] LL001 LL002 LL003 LL007 ... .. ..$ : chr [1:3] MDS1 MDS2 MDS3 ..- attr(*, centre)= logi TRUE ..- attr(*, pc)= logi TRUE ..- attr(*, halfchange)= logi FALSE $ stress: num 0.157 $ grstress : num 0.157 $ iters : int 180 $ icause: int 3 $ call : language metaMDS(comm = as.dist(f.bSBS.org.tyc), k = 3, trymax = 250, wascores = FALSE
Re: [R-sig-eco] NA error in envfit
It is easy if you have C and Fortran compilers plus unix tools. I assume most people do not have those. Then 'easy' is quite different a concept. Cheers, Jari Oksanen alkuperäinen viesti Lähettäjä: Hadley Wickham Lähetetty: 05.12.2013, 16:19 Vastaanottaja: Eduard Szöcs Kopio: r-sig-ecology@r-project.org Aihe: Re: [R-sig-eco] NA error in envfit # install vegan from github install_github('vegan', 'jarioksa') BTW I recommend using this form: install_github('jarioksa/vegan') Hadley -- http://had.co.nz/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Community composition variance partitioning?
Hi, Not only odd, but impossible. If you have a model y ~ x1, and you *add* a new explanatory variable, you cannot get worse in raw R2. You can get worse in adjusted R2. You can also get worse if you add variables to a matrix for which you calculate distances. So dist(y) ~ dist([x1]) can have higher R2 than dist(y) ~ dist([x1,x2]) -- bioenv is based on this. Cheers, Jari Oksanen Sent from my iPad On 4.12.2013, at 20.19, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, That seems a bit odd: can you provide a reproducible example, off-list if necessary? Sarah On Wed, Dec 4, 2013 at 12:50 PM, Alexandre Fadigas de Souza alexso...@cb.ufrn.br wrote: Dear friends, My name is Alexandre and I am trying to analyze a dataset on floristic composition of tropical coastal vegetation by means of variance partition, according to the outlines of a Tuomisto's recent papers, specially Tuomisto, H., Ruokolainen, L., Ruokolainen, K., 2012. Modelling niche and neutral dynamics : on the ecological interpretation of variation partitioning results. Ecography (Cop.). 35, 961–971. I have a doubt, could you please give your opinion on it? We are proceeding a variance partition of the bray-curtis floristic distance using as explanatory fractions soil nutrition, topography, canopy openess and geographical distances (all as euclidean distance matrices). We are using the MRM function of the ecodist package: mrm - MRM(dist(species) ~ dist(soil) + dist(topograph) + dist(light) + dist(xy), data=my.data, nperm=1 The idea is that the overall R2 of this multiple regression should be used to assess the contributions of the spatial and environmental fractions through subtraction : Three separate multiple regression analyses are needed to assess the relative explanatory power of geographical and environmental distances. All of these have the same response variable (the compositional dissimilarity matrix), but each analysis uses a diff erent set of the explanatory variables. In these analyses the explanatory variables are: (I) the geographical distance matrix only, (II) the environmental diff erence matrices only, and (III) all the explanatory variables used in (I) or (II). Comparing the R 2 values from these three analyses allows partitioning the variance of the response dissimilarity matrix to four fractions. Fraction A is explained uniquely by the environmental diff erence matrices and equals R2 (III) R2 (I). Fraction B is explained jointly by the environmental and geographical distances and equals R2 (I) R2 (II) R2 (III). Fraction C is explained uniquely by geographical distances and equals R2 (III) R2 (II). Fraction D is unexplained by the available environmental and geographical dissimilarity matrices and equals 100% R2 (III) (throughout the present paper, R2 values are expressed as percentages rather than proportions). [Tuomisto et al. 2012] The problem is that the R2 of the overall model (containing all the explanatory variables) is smaller than most of the R2 of models containing each of the explanatory matrices. So it seems not possible to proceed with the approach proposed. Sincerely, Alexandre Dr. Alexandre F. Souza Professor Adjunto II Departamento de Botanica, Ecologia e Zoologia Universidade Federal do Rio Grande do Norte (UFRN) http://www.docente.ufrn.br/alexsouza Curriculo: lattes.cnpq.br/7844758818522706 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NA in species scores using capscale in vegan
On 07/11/2013, at 12:05 PM, Duarte Viana wrote: Hello all, I am doing a CAP (or dbRDA) in vegan, using the function capscale, where the response is a dissimilarity matrix (Sorensen's distance index) and the explanatory matrix consists of 19 variables. The code runs apparently fine, but the species scores appear as NA (site scores are properly given). Does anyone knows what might be the problem? Duarte, Did you supply species data? Species scores cannot be estimated without information on species. If your response is a dissimilarity, there is no information about the original species. You can supply the original community data (which has the information on species) as argument 'comm' in capscale(). Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate Analyses of Ecological Communities
On 04/11/2013, at 17:59 PM, Rich Shepard wrote: I've read the docs for anosim, adonis, and mrpp and am currently reading the vegan tutorial. Toward the end of the anosim and mprr docs the author writes that each has potential shortcomings and recommends the adonis package be used instead. This leads to my two questions: 1) Why are anosin and mrpp packages still maintained and included? That they are suggests question To serve the people? 2) Under what circumstances are these two packages appropriately used? I don't know, but since they are available, it is possible to find out. I have suggested in the documentation that adonis() usually is better, but it also suffers from similar problems as these alternatives. My analysis suggests that adonis() is more robust and reliable. However, not everybody agrees, and alternatives are provided for these standard methods. Since the alternatives are provided, you can study the issue independently. The existence of a method is not a sufficient reason to use that method. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Cross validate model with calibrate
Dear Philipp, I cannot see anything wrong in your approach. The only problem is that RDA may not do very well. It is very common that the calibrated values are outside the original range of the data and if the predictions are not very good, they can also be unrealistic values. RDA has no idea what is realistic for given data. You get cross-validated values outside data range even in the Dune meadow data you used. Here an example that applies 4-fold cross validation (in line with 3/4 subsetting in your example): library(vegan) data(dune, dune.env) vec - rep(1:4, length=nrow(dune)) pred - rep(NA, length=nrow(dune)) set.seed(4711) take - sample(vec) for(i in 1:4) pred[take==i] - calibrate(rda(dune ~ A1, dune.env, subset = take != i), newdata = subset(dune, take==i)) range(pred) ## [1] 0.9104486 12.3583826 with(dune.env, range(A1)) ## [1] 2.8 11.5 The predicted values extend nearly 2cm below and 1cm above the data values. In this case the predicted values were not negative, but that could happen, too. The tool (RDA) may not work very well in all cases, but RMSE will tell you how good RDA is for the job. If it is bad, do not use it. It seems RDA is very poor in this dune example: RMSE is larger than standard deviation of A1 (but you could almost guess this by looking at the low constrained eigenvalues of the model). The code above uses the 'subset' argument of vegan::rda and the subset() function of R to make the code a bit easier to write. Cheers, Jari Oksanen On 01/11/2013, at 15:53 PM, Philipp01 wrote: Hello all, I would like to cross validate my rda() derived model with the calibrate function (vegan package) and calculate the RMSE as value for performance measure. For simplicity I use the example from the predict.cca {vegan} help. library(ade4) library(vegan) data(dune) data(dune.env) nbr - as.numeric(rownames(dune)) library(caret) inTrain - createDataPartition(y=nbr, p=3/4, list=F, times=1) train.dune - dune[inTrain[,i],]; test.dune - dune[-inTrain[,i],]; train.dune.env - dune.env[inTrain[,i],]; test.dune.env - dune.env[-inTrain[,i],]; mod - rda(train.dune ~ A1, train.dune.env) cal - calibrate(mod, newdata=test.dune) with(test.dune.env, plot(A1, cal[,A1] - A1, ylab=Prediction Error)) abline(h=0) error - cal - test.dune.env$A1 (rmse - sqrt(mean(error^2))) When I apply this code snippet to my very own data I get positive and negative cal values, which would be unrealistic for parameters such as tree height (etc.). Therefore, I doubt that my approach is correct. How do you compute the RMSE for the rda() derived model? Regards, Philipp -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/Cross-validate-model-with-calibrate-tp7578486.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] angular statistics
If you use these, remember that R cos() needs argument in radians. Cheers, Jari Oksanen Sent from my iPad On 18.10.2013, at 8.40, Ivailo ubuntero.9...@gmail.com wrote: On Fri, Oct 18, 2013 at 6:16 AM, Michael Marsh sw...@blarg.net wrote: If you want a measure of exposure, i. e., heat, I suggest using the heatload transformation suggested by McCune and Grace (2002). Their assumption is that mid-afternoon, when the sun is in the southwest, is usually the warmest time of day. The formula at the end of Chapter 3 follows: heat load index=(1-cos(degrees-45))/2 McCune, Bruce and James B. Grace. 2002. Analysis of ecological communities. MJM Software Design. Gleneden Beach, Oregon. USA Thanks for the interesting discussion! I'd like to add that although I don't have the book, I found the radiation measures presented in the following paper: McCune, B. and D. Keon. 2002. Equations for potential annual direct incident radiation and heat load. Journal of Vegetation Science 13:603–606. Cheers, Ivailo -- UBUNTU: a person is a person through other persons. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] vegan RsquareAdj() for lm models
Specific reason is that nobody has implemented this. These things don't come by automatic writing, but somebody must do them. What would you expect to get? Is this what was on your mind: sapply(summary(lm(yy~xx-1)), function(x) c(r.squared = x$r.squared, adj.r.squared = x$adj.r.squared)) Response Y1 Response Y2 Response Y3 Response Y4 r.squared 0.06845032 0.04788037 0.01702738 0.11253059 adj.r.squared -0.01255400 -0.03491264 -0.06844849 0.03535934 This could be implemented, but (1) is this what you or anybody else would like to have?, (2) how many things would it break by returning several values instead of one? If you want to have this, you really do not need to use vegan. vegan:::RsquareAdj.lm() takes its results from summary(lm_object). You can use that stats:::summary.lm directly. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Paolo Piras [paolo.pi...@uniroma3.it] Sent: 03 October 2013 14:59 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] vegan RsquareAdj() for lm models Dear list, I would like to easily compute the adjusted R-square in a lm model without intercept (excluding the intercept is essential for my analysis) I found that RsquareAdj() in vegan returns NA if the argument is a multiple-multivariate lm model thus including multivariate responses and multiple predictors, while it works for univariate response and multiple predictors. For example: library(vegan) yy-matrix(rnorm(200,0,1),ncol=4) xx-matrix(rnorm(200,0,1),ncol=4) RsquareAdj(lm(yy~xx-1)) RsquareAdj(lm(yy[,1]~xx-1)) There some specific reason for this behavior? Thanks in advance for any advice best regards Poalo ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] vegan RsquareAdj() for lm models
Paolo, See ?RsquareAdj for the call interface. The default method can be called as RsquareAdj(x, n, m), and in the default method x is the unadjusted correlation, n is the number of observations and m is the number of parameters (degrees of freedom) in the fitted model. Specific methods for univariate lm or for rda (and some others) find these variables in the result object, but then they just call the default method with the found x, n and m. You can build your model on that. It is possible to build a specific function for mlm objects, but nobody has done so in vegan. You cannot build an rda design without an intercept. It was a conscious design decision to make this impossible without hacking the rda.default code (I even say this in decision-vegan vignette). I am not going to make easy to have non-centred RDA: I care too much about people and i don't want to do evil. If you really *know* that you need non-centred RDA. then you know how to change those lines of code in rda.default. Cheers, Jari Oksanen From: Paolo Piras [paolo.pi...@uniroma3.it] Sent: 03 October 2013 15:52 To: Jari Oksanen; r-sig-ecology@r-project.org Subject: RE: [R-sig-eco] vegan RsquareAdj() for lm models Thankyou very much Jari! I think that it is nearly ok what I would like to have is the same as in RsquareAdj(vegan::rda(yy,xx)) that is a GLOBAL measure of the association BUT...I want it for a multiple-multivariate lm model that does not include the intercept; an alternative could be to build a rda design for the exclusion of intercept but I really cannot figure out how to do it. I think I just need to compute the average of single adjusted r squared from the output of your line of code, But the results are not identical EXAMPLE WITH INTERCEPT IN ORDER TO COMPARE WITH RDA RsquareAdj(vegan::rda(yy,xx)) mean(sapply(summary(lm(yy~xx)), function(x) c(r.squared = x$r.squared, adj.r.squared = x$adj.r.squared))[2,]) Or I just miss something in this computation Thanks again for any further suggestion Da: Jari Oksanen jari.oksa...@oulu.fi Inviato: giovedì 3 ottobre 2013 14.25 A: Paolo Piras; r-sig-ecology@r-project.org Oggetto: RE: [R-sig-eco] vegan RsquareAdj() for lm models Specific reason is that nobody has implemented this. These things don't come by automatic writing, but somebody must do them. What would you expect to get? Is this what was on your mind: sapply(summary(lm(yy~xx-1)), function(x) c(r.squared = x$r.squared, adj.r.squared = x$adj.r.squared)) Response Y1 Response Y2 Response Y3 Response Y4 r.squared 0.06845032 0.04788037 0.01702738 0.11253059 adj.r.squared -0.01255400 -0.03491264 -0.06844849 0.03535934 This could be implemented, but (1) is this what you or anybody else would like to have?, (2) how many things would it break by returning several values instead of one? If you want to have this, you really do not need to use vegan. vegan:::RsquareAdj.lm() takes its results from summary(lm_object). You can use that stats:::summary.lm directly. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Paolo Piras [paolo.pi...@uniroma3.it] Sent: 03 October 2013 14:59 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] vegan RsquareAdj() for lm models Dear list, I would like to easily compute the adjusted R-square in a lm model without intercept (excluding the intercept is essential for my analysis) I found that RsquareAdj() in vegan returns NA if the argument is a multiple-multivariate lm model thus including multivariate responses and multiple predictors, while it works for univariate response and multiple predictors. For example: library(vegan) yy-matrix(rnorm(200,0,1),ncol=4) xx-matrix(rnorm(200,0,1),ncol=4) RsquareAdj(lm(yy~xx-1)) RsquareAdj(lm(yy[,1]~xx-1)) There some specific reason for this behavior? Thanks in advance for any advice best regards Poalo ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] null model for testing nestedness
Valerie, There are at least two problems here: the way you call oecosimu() and how nestdnodf(..., weighted =TRUE) works with binary data. If you specify a *binary* null model as method, then you will get binary data. Even if you supplied quantitative data, they are transformed into 1/0 (presence/absence) data. You specified method = quasiswap, and that is binary model. Another problem is that nestednodf(..., weighted = TRUE) seems to evaluate the statistics all as zeros if you request weighted (= quantitative data) analysis of non-quantitative data (binary). It cannot perform weighted analysis if there are no weights, but still I think it should return something else than zeros. We'll have a look at that issue. You should specify a non-binary null model if you want to have a non-binary (weighted) analysis. Quantitative null models are problematic, and vegan release version does not have much choice here. I think r2dtable may be the only one. Development version of vegan in http://www.r-forge.r-project.org/ has a wider gamme of non-binary null models, but I think you need to be brave to use quantitative null models. They are something for people who are not afraid of going to areas where angels fear to tread. FWIW, weighted nestednodf seems to work in oecosimu if you ask for a quantitative nullmodel (r2dtable in my tests) both with the release version (2.0-8 or 2.0-9) and with the development version (2.1-35 or 2.1-36). But you really need to to specify a quantitative null model. Both null models and oecosimu are completely re-written and re-designed in development versions. Cheers, Jari Oksanen On 25/09/2013, at 15:56 PM, v_coudr...@voila.fr wrote: Thank you very much. Yes it is working with oecosimu, exept that it does not seem to work for weighted data. There is the possibility to specify weighted = TRUE: oecosimu(matrix,nestednodf, method = quasiswap, nsimul = 999, order = FALSE, weighted =TRUE) However, I get only null values and p=1. For weighted = F, I get good values. Best wishes ___ Quiz TV : Vous êtes fan de la série Friends ? 5 questions ici http://tv.voila.fr/quiz/quiz-special-friends_14538959.html ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] subsetting lower triangle distance matrix based on variable in another object
Kendra, Subset a matrix an convert to dist. You must subset both rows and columns in the same way, and therefore you must use subsetting index instead of a subset() function. Let k be a vector that is TRUE for selected items; as.dist(yoursymmetricmatrix[k, k]) is what you are looking for. Cheers, Jari Oksanen alkuperäinen viesti Lähettäjä: Mitchell, Kendra Lähetetty: 31.07.2013, 02:38 Vastaanottaja: r-sig-ecology@r-project.org r-sig-ecology@r-project.org Aihe: [R-sig-eco] subsetting lower triangle distance matrix based on variable in another object I have a number of dissimilarity matrices built in another program that I'd like to subset then run NMS. #read in distance matrix b_bc03 -data.matrix(read.table(bac_final.an.thetayc.0.03.lt.ave.dist, row.names=1, header=T, fill=TRUE)) # creates a 736x736 double matrix #read in environmental variables bac_env-read.csv(bac_samples.csv, row.names=1, header=T) # 736 obs. of 6 variables #The only way that I've figured out how to subset a lt matrix is to convert to dist object first bc_dist-as.dist(b_bc03) # dist[27048] tx_dist-subset(bc_dist, bac_env$zone==TX) # numeric[54316] Here is where it falls apart, the subset doesn't keep the labels, just the values so I can't convert it back to a distance matrix. I've looked through ecodist but can't find anyway in that package to select only certain samples. Help please? thanks Kendra -- Kendra Maas Mitchell, Ph.D. Post Doctoral Research Fellow University of British Columbia 604-822-5646 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R
Vegemite is vegemite because it is so condensed. It is condensed because it uses only one column for data and no separators between columns. If you want to print your full data, just do so. You do not need any fancy functions for the purpose. If you want to reorder your data, learn how to use row and column indices. For vegemite you need one-character scaling, and vegemite provides some popular alternatives. You are not limited those, nor to numeric characters, but you can roll your own. If you do not provide one-character data, vegemite gives an error message that is intended to be informative: it says what was the problem, and suggests what to do (use scale). See ?vegemite. Vegan provides an interface to standard graphical heatmap. This sister of vegemite is called tabasco (because it is also condensed but hotter than vegemite). It takes any non-negative data, but you may need scaling to see shades of colours. Names can also be unreadable in large tables. Fixes are welcome. I think ade4 may have something similar. Cheers, Jari Oksanen alkuperäinen viesti Lähettäjä: Basil Iannone Lähetetty: 24.07.2013, 14:57 Vastaanottaja: Sarah Goslee Kopio: r-sig-ecology@r-project.org Aihe: Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R Valarie, I too ran across this problem. vegmite operates on data with a single digit such as Braun-Blanquet cover estimates. Thus vegemite command will not work on data such as environmental data or biomass data unless you convert it to a single digit, causing you to loose a lot of information. So in short, vegmite is a method used to compile and arrange single digit cover estimates into a table. If you find a similar function that works on other kinds of data, please let me and the other R users know. And someone please correct me if my assessment of the situation is incorrect. On Wed, Jul 24, 2013 at 6:14 AM, Sarah Goslee sarah.gos...@gmail.comwrote: Hi Valerie, Did the suggestions given on the R-help list fail to work? You do need to provide you abundance data as one character, as specified in ?vegemite which also gives suggestions for conversion. Sarah On Tuesday, July 23, 2013, Valerie Mucciarelli wrote: Hello, I am relatively new to R and I am working through the code that is provided in the book Numerical Ecology with R: http://xa.yimg.com/kq/groups/19243105/1919134110/name/Numerical.pdf (pg 79) and I have run across an error message that I can't seem to figure out. I am using the vegan, ade4, gclus and cluster packages. The code is as follows: # Ordered community table # Species are ordered by their weighted averages on site scores or - vegemite(spe, spe.chwo) spe is the dataframe, here is part of it: AGA ANT BON CAL1 CAL CER CRY DES EUTH FRY 1 0.420 0.092 0.051 0.000 0.975 0.000 0.111 0.000 0.127 0 2 0.000 0.000 0.007 0.002 0.915 0.000 0.000 0.000 0.151 0 4 0.000 0.008 0.000 0.009 0.124 0.003 0.000 0.000 0.095 0 7 0.000 0.002 0.003 0.002 0.121 0.002 0.000 3.573 0.180 0 12 0.000 0.020 0.000 0.002 0.444 0.001 0.000 0.000 0.242 0 13 8.727 0.000 0.000 0.000 0.743 0.000 0.000 0.000 0.050 0 14 2.163 0.009 0.000 0.003 1.121 0.000 0.000 0.000 0.051 0 15 0.000 0.004 0.000 0.000 0.109 0.000 0.000 0.000 0.007 0 18 9.021 0.018 0.002 0.000 0.286 0.000 0.000 0.000 0.028 0 19 0.000 0.038 0.000 0.019 0.509 0.000 0.000 0.000 0.155 0 spe.chwo came from: spe.norm - decostand(spe, normalize) spe.ch - vegdist(spe.norm, euc) spe.ch.UPGMA - hclust(spe.ch, method = average) spe.chwo - reorder.hclust(spe.ch.UPGMA, spe.ch) and the error is: Error in vegemite(spe, spe.chwo) : Cowardly refusing to use longer than 1 char symbols: Use scale The data in the dataframe is biomass data recorded to 4 digits. I'm wondering if this code is not working because my data is more than one digit long. Any suggestions or insight on how to get this code to work would be greatly appreciated. Thank you, Val -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Basil V. Iannone III, Ph.D. University of Illinois at Chicago Department of Biological Sciences (MC 066) 845 W. Taylor St. Chicago, IL 60607-7060 Email: bian...@uic.edu Phone: 312-355-0987 Fax: 312-413-2435 http://www2.uic.edu/~bianno2 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman
Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R
In some cases lousy email system mandates promoting piracy: i replys, the previous messages are treated as appendices that cannot be edited. This also means that you must top-post. Kudos goes to MicroSoft + Nokia. Cheers, Jari Oksanen alkuperäinen viesti Lähettäjä: THIOULOUSE JEAN Lähetetty: 24.07.2013, 16:59 Vastaanottaja: François Gillet Kopio: ethics.report...@springer.com ethics.report...@springer.com; r-sig-ecology@r-project.org; Daniel Borcard Aihe: Re: [R-sig-eco] Heat Map for species - code from Numerical Ecology with R Hi François, Note that Jari and yourself also encourage piracy in the same way ;) Jean Le 24 juil. 2013 à 15:16, François Gillet francois.gil...@univ-fcomte.fr a écrit : Valerie and Sarah, Thank you for encouraging piracy by posting the URL of an illegal PDF of our book :-( By the way, just look at ?vegemite and you'll find easily the solution to your problem: with the argument scale=log your numeric data will be converted to 1-char symbols. François --- Prof. *François Gillet* Université de Franche-Comté - CNRS UMR 6249 Chrono-environnement UFR Sciences et Techniques 16, Route de Gray F-25030 Besançon cedex France http://chrono-environnement.univ-fcomte.fr/ http://chrono-environnement.univ-fcomte.fr/spip.php?article530 Phone: +33 (0)3 81 66 62 81 iPhone: +33 (0)7 88 37 07 76 Location: La Bouloie, Bât. Propédeutique, *-114L* --- Editor of* Plant Ecology and Evolution* http://www.plecevo.eu --- * *** 2013/7/24 Sarah Goslee sarah.gos...@gmail.com Hi Valerie, Did the suggestions given on the R-help list fail to work? You do need to provide you abundance data as one character, as specified in ?vegemite which also gives suggestions for conversion. Sarah On Tuesday, July 23, 2013, Valerie Mucciarelli wrote: Hello, I am relatively new to R and I am working through the code that is provided in the book Numerical Ecology with R: (pg 79) and I have run across an error message that I can't seem to figure out. I am using the vegan, ade4, gclus and cluster packages. The code is as follows: # Ordered community table # Species are ordered by their weighted averages on site scores or - vegemite(spe, spe.chwo) spe is the dataframe, here is part of it: AGA ANT BON CAL1 CAL CER CRY DES EUTH FRY 1 0.420 0.092 0.051 0.000 0.975 0.000 0.111 0.000 0.127 0 2 0.000 0.000 0.007 0.002 0.915 0.000 0.000 0.000 0.151 0 4 0.000 0.008 0.000 0.009 0.124 0.003 0.000 0.000 0.095 0 7 0.000 0.002 0.003 0.002 0.121 0.002 0.000 3.573 0.180 0 12 0.000 0.020 0.000 0.002 0.444 0.001 0.000 0.000 0.242 0 13 8.727 0.000 0.000 0.000 0.743 0.000 0.000 0.000 0.050 0 14 2.163 0.009 0.000 0.003 1.121 0.000 0.000 0.000 0.051 0 15 0.000 0.004 0.000 0.000 0.109 0.000 0.000 0.000 0.007 0 18 9.021 0.018 0.002 0.000 0.286 0.000 0.000 0.000 0.028 0 19 0.000 0.038 0.000 0.019 0.509 0.000 0.000 0.000 0.155 0 spe.chwo came from: spe.norm - decostand(spe, normalize) spe.ch - vegdist(spe.norm, euc) spe.ch.UPGMA - hclust(spe.ch, method = average) spe.chwo - reorder.hclust(spe.ch.UPGMA, spe.ch) and the error is: Error in vegemite(spe, spe.chwo) : Cowardly refusing to use longer than 1 char symbols: Use scale The data in the dataframe is biomass data recorded to 4 digits. I'm wondering if this code is not working because my data is more than one digit long. Any suggestions or insight on how to get this code to work would be greatly appreciated. Thank you, Val -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] error in MetaMDS
On 21/07/2013, at 03:02 AM, Elaine Kuo wrote: Hello I am trying to run MetaMDS but got the following error based on the code. Please kindly indicate the error source and thank you. Elaine Code # Compute the Simpson distance matrices for the dataset (3 hr using 4GB) library(betapart) dist.sim-beta.pair(dataR, index.family=sor) class(dist.sim) betasim.sim-dist.sim[[beta.sim]] class(betasim.sim) # NMDS library(vegan) nms - metaMDS(betasim.sim,trymax=100,zerodist=add,autotransform =FALSE,k=2) Error Error in if (any(autotransform, noshare 0, wascores) any(comm 0)) { : missing value where TRUE/FALSE needed Elaine, I cannot reproduce this, and there is no information in the message to know what happens. Do you have missing values in betasim.sim? Are betasim.sim really dissimilarities? BTW, vegan has Simpson dissimilarity: use betadiver(x, sim). Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] forward selection RDA after controlling for constraints
On 10/07/2013, at 21:00 PM, Stephen Sefick wrote: Hello all, I would like to run this by everyone and maybe get some hints as to what R functions I could use for this. Ok, so I have macroinvertebrate assemblage data from across the SE. I would like to control for geographic distance (lat/long), Watershed area, and year before submitting these data to an RDA with the rest of the environmental data using a variable selection technique. Does it make sense to detrend the data using a mlm on hellinger transfomed abundances with the above env variables as regressors and then submit the residuals to rda with the rest of the env variables I am interested in? Stephen, If you happen to use vegan functions for forward selection, please note that they all (should) take a scope argument that can (should) be a list of lower and upper scopes. Put your controlled variables (distance???, watershed area, year) in the lower scope and these plus other candidate variables in the upper scope, and there you go. I have used should, because I have rarely used these functions myself, and I'm not sure if lower scope really is implemented in all, but is *should* be: file a bug report if this fails. I have no idea how to have distance RDA. Well, I have ideas, but none that I have are very good. Using separate mlm and modelling residuals will not work quite correctly, because that ignores correlations between groups of variables. Vegan functions do not ignore those. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] anova.cca question / missing data in constraining matrix
Dear Colleen, On 03/06/2013, at 22:32 PM, ckellogg wrote: Hello Jari, Thank you for your help with this. The solution you suggested in your second post worked quite well. However, i think another subset of my data is too 'holey', because when I run CCA on this set of environmental variables (or the a CCA with the previous environmental variables and the additional ones), I get an error: toolik250.cca2 -cca(toolikotus250.ra~logtemp+conductivity+pH+logBacProd+DIC+logDCO2+sqrtDCH4+logDOC+sqrtPhosphate+sqrtNitrate+sqrtTDN+sqrtTDP+logPC+logPN+Ca+Mg+logNa+logK+SO4+logChloride+Silica,toolikenv.s, na.action=na.exclude) Error in predict.cca(x, newdata = excluded, type = wa, model = CA) : model “CA” has rank 0 The CCA runs if I use na.action=na.omit, but then when I run the anovas, there is apparently no residual component. For example, No residual component Model: cca(formula = toolikotus250.ra ~ logtemp + conductivity + pH + logBacProd + DIC + logDCO2 + sqrtDCH4 + logDOC + sqrtPhosphate + sqrtNitrate + sqrtTDN + sqrtTDP + logPC + logPN + Ca + Mg + logNa + logK + SO4 + logChloride + Silica, data = toolikenv.s, na.action = na.omit, subset = -toolik250.cca2$na.action) Df Chisq F N.Perm Pr(F) Model12 5.30030 Residual 0 0. Yes, probably too many holes. You have no residual variation which indicates that the number of predictor variables (constraints) is higher than the number of remaining observations. Cheers, Jari Oksanen So, I am thinking that examining the relationship between the microbial community and this subset of environmental variables might not be possible without my first manually curating which samples and variables should be included, correct? Thank you, Colleen -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/anova-cca-question-missing-data-in-constraining-matrix-tp7578175p7578179.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] anova.cca question / missing data in constraining matrix
On 01/06/2013, at 05:20 AM, Jari Oksanen wrote: The CCA seems to run just fine, but when I attempt to do the posthoc tests such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error message: Error in anova.ccabyterm(object, step = step, ...) : number of rows has changed: remove missing values? What exactly is occurring here to cause this error - I suspect it must be related to the fact that the environmental data matrix has a lot of missing data. I don't quite understand why it states that the number of rows has changed...changed from what? The number of rows has changed from term to term. That is, you have different numbers of missing values in each term (= explanatory variable), and when rows with missing values are removed for the current model, the accepted observations change from term to term. I admit the error message is not the most obvious one. I must see where it comes from, and how to make it more informative. However, it does give a hint to remove missing values, doesn't it? If you want to have a term-wise test with missing values in terms, you must refit your model for complete.cases. Use argument 'subset' to select a subset with no missing values. Currently I don't know any nice short cut to do this with the current mode, but the following may work (untested), although it is not nice: keep - rep(TRUE, nrow(tooliken.s) keep[toolik250.cca$na.action] - FALSE m2 - update(toolik250.cca, subset = keep) anova(m2, by=terms, perm=999) Actually, there is a bit easier way of doing this, because 'subset' can also be a vector of indices, and negative indices acn be used to remove observations. If 'toolik250.cca' is a result object with missing observations, then m2 - update(toolik250.cca, subset = -toolik250.cca$na.action) will remove items listed as removed in 'na.action' (NB. the minus sign in 'subset'). The update()d model will be equal to the original model, but missing data removed. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] anova.cca question / missing data in constraining matrix
Hello, On 01/06/2013, at 02:10 AM, ckellogg wrote: Hello, I am using the cca function in Vegan to examine the relationship between microbial community structure and a (large) suite of environmental variables. My constraining/environmental data matrix as a lot of holes in it so I have been exploring using the na.action argument. This is the command I am currently using: toolik250.cca-cca(toolikotus250.ra~julianday+logwindspd_max_1dayprior+lograin_3dayprior+sqrtwindspd_1dayprior+windspd_3dayprior+days_since_thaw+days_since_iceout+days_btw_iceoutandthaw+toolikepitemp_24h+logtemp+conductivity+pH, toolikenv.s, na.action=na.omit) The CCA seems to run just fine, but when I attempt to do the posthoc tests such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error message: Error in anova.ccabyterm(object, step = step, ...) : number of rows has changed: remove missing values? What exactly is occurring here to cause this error - I suspect it must be related to the fact that the environmental data matrix has a lot of missing data. I don't quite understand why it states that the number of rows has changed...changed from what? The number of rows has changed from term to term. That is, you have different numbers of missing values in each term (= explanatory variable), and when rows with missing values are removed for the current model, the accepted observations change from term to term. I admit the error message is not the most obvious one. I must see where it comes from, and how to make it more informative. However, it does give a hint to remove missing values, doesn't it? If you want to have a term-wise test with missing values in terms, you must refit your model for complete.cases. Use argument 'subset' to select a subset with no missing values. Currently I don't know any nice short cut to do this with the current mode, but the following may work (untested), although it is not nice: keep - rep(TRUE, nrow(tooliken.s) keep[toolik250.cca$na.action] - FALSE m2 - update(toolik250.cca, subset = keep) anova(m2, by=terms, perm=999) Is there any way to get around having missing data when running the anovas as you can when running the CCA itself? One other question I have is when I try and run the CCA on all the data in my environmental data matrix (toolikenv.s), not just a subset of variables as I do above, using this command: toolik250.cca -cca(toolikotus250.ra~., toolikenv.s, na.action=na.omit) I get the following error message. Error in svd(Xbar, nu = 0, nv = 0) : a dimension is zero What might be causing this error message to be thrown? What does sum(complete.cases(toolikenv.s)) give as a result? Does it give 0? I suspect you have so many holes that nothing is left when you remove rows with any missing values. The message is about an attempt to analyse zero-dimensional matrix. Thank you so much for your help. Maybe I will just have to filter out the samples with missing environmental data (or filter out some of the variables themselves if they have too much missing data), but I was just hoping to avoid having to do this. The functions can handle missing values, but they handle them by removing the observation. Do you want to lose a huge number of rows? We won't invent values to replace missing data in cca(). Some people have suggested ways to do that, and that is not difficult: just search for imputation in R (for instance, package mice). However, the real problem is how to compare and summarize the multivariate results after imputation. Further, if you have a lot of missing values, nothing may be very reliable. It could be possible to collect together and combine permutation test results after multiple imputation, but better consult a statistician before trying to do this. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Functional response type II question, curve fitting and
Fernando, For background reading you may check Stevens. M.H.H. (2009) A Primer of Ecology with R. Use R! Series. Springer. and associated 'primer' package in CRAN. The book discusses functional responses. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Pierre THIRIET [pierre.d.thir...@gmail.com] Sent: 30 May 2013 09:58 To: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] Functional response type II question, curve fitting and Dear Fernando, I would suggest the function nls() that fit non-linear model. You can specify by hand your equation and starting values of your parameters (your guessed value) R include some Self-Starting functions for the model the most used, this could be easier for you to work with it. check ?selfStart() According to wikipedia, Type II functional response is similar to the Michaelis--Menten equation, which is implemented in R as self starting function:http://stat.ethz.ch/R-manual/R-patched/library/stats/html/SSmicmen.html By adpting examples, you could easily fit your model, if the choosen equation is suitable. About statistical comparisons of estimated parameters among curves, I have no idea but I found, among other this post: http://stats.stackexchange.com/questions/26611/how-to-test-the-effect-of-a-grouping-variable-with-a-non-linear-model Good luck, Pierre Le 30/05/2013 07:49, Luis Fernando García Hernández a écrit : Dear all, I am new on the list and on the more complex applications of R, so I ask you to excuse me if my mail is too long. This time I have to do several tasks related tu functional reponse. I want to evaluate the relationship between prey density (x) vs prey consumed(y) the idea is to fit the data set to the functional reponse equation: Na/TP=a/(1+aThN). I have idea about the value of most of the equation parameters, but I am completely lost about how to fit the data points to this equation. On another hand I have to graph and compare if there are significative differences on estimated parameters of the curves for lepidopterans and ants, the final data should look like the attacher picture.If any of you have some idea about how to do this, would be really appreciated. Below you can find the data and the pictures. Thanks in advance Prey Density Eaten Lepidopteran 1 2 Lepidopteran 1 3 Lepidopteran 1 3 Lepidopteran 3 4 Lepidopteran 3 5 Lepidopteran 3 4 Lepidopteran 5 7 Lepidopteran 5 5 Lepidopteran 5 6 Lepidopteran 10 8 Lepidopteran 10 10 Lepidopteran 10 7 Ant 1 3 Ant 1 1 Ant 3 4 Ant 3 2 Ant 5 4 Ant 5 6 Ant 10 5 Ant 10 6 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] adonis
Alan, A few comments on your procedure. You have two non-standard things in your message: you try to do something that looks like post hoc tests, and you use non-standard contrasts. There is nothing post hoc in your post hoc tests. What you do is that you break your factor variable into separate contrasts. If do so, you should carefully read the adonis output which says Terms added sequentially (first to last) If your contrasts are correlated, like they are in the example you gave, the results for individual terms will depend on the order of terms. Usually people associate post hoc tests with multiple testing problem, but there is nothing about that in the example you gave. It is just simple testing of individual contrasts. Second point is that you used non-standard contrasts. The species coefficients will depend on contrasts and therefore they change. There are easier way of doing the same. For instance, you seem to want to have sum contrasts, but with different baseline level. Check functions like model.matrix, contrasts, relevel, and as.data.frame. However, the magnitude of coefficient also depends on specific contrasts that you use. Cheers, Jari Oksanen On 24/05/2013, at 16:48 PM, Alan Haynes wrote: Hi all, Im using adonis for some plant community analysis and have been following theBioBucket example of how to posthoc tests ( http://thebiobucket.blogspot.ch/2011/08/two-way-permanova-adonis-with-custom.html ) data(dune) data(dune.env) ad1 - adonis(dune ~ Management, data=dune.env, permutations=99) # Call: # adonis(formula = dune ~ Management, data = dune.env, permutations = 99) # # Terms added sequentially (first to last) # # Df SumsOfSqs MeanSqs F.Model R2 Pr(F) # Management 31.4686 0.48953 2.7672 0.34161 0.01 ** # Residuals 162.8304 0.17690 0.65839 # Total 194.2990 1.0 # --- # Signif. codes: 0 Œ***‚ 0.001 Œ**‚ 0.01 Œ*‚ 0.05 Œ.‚ 0.1 Œ ‚ 1 man - dune.env$Management contmat - cbind(c(1,-1,0,0), c(1,0,-1,0), # construct a new contrast matrix c(1,0,0,-1), c(0,1,-1,0), c(0,1,0,-1), c(0,0,1,-1)) contrasts(man) - contmat[,1:4] trt1.2 - model.matrix(~ man)[,2] trt1.3 - model.matrix(~ man)[,3] trt1.4 - model.matrix(~ man)[,4] ad2 - adonis(dune ~ trt1.2 + trt1.3 + trt1.4 ) # Call: # adonis(formula = dune ~ trt1.2 + trt1.3 + trt1.4) # # Terms added sequentially (first to last) # # Df SumsOfSqs MeanSqs F.Model R2 Pr(F) # trt1.2 10.1483 0.14827 0.8381 0.03449 0.545 # trt1.3 10.8371 0.83712 4.7321 0.19472 0.001 *** # trt1.4 10.4832 0.48321 2.7315 0.11240 0.032 * # Residuals 162.8304 0.17690 0.65839 # Total 194.2990 1.0 # --- # Signif. codes: 0 Œ***‚ 0.001 Œ**‚ 0.01 Œ*‚ 0.05 Œ.‚ 0.1 Œ ‚ 1 I was just wondering whether it was fair to say that the species with high coefficients (adonis(...)$coefficients) were the ones causing that difference? ad2$coefficients[3,abs(ad$coefficients[3,])1] # ElepalPoapraSalrepPoatriElyrepLolperAlogen # -1.091667 1.975000 -1.375000 3.28 1.33 3.00 1.65 If so, would it be better to take the coefficients from the original model or the model used for the contrast, as these yield different results: ad1$coefficients[3,abs(ad1$coefficients[3,])1] # Rumace Tripra Poatri Plalan # 2.316667 1.35 1.516667 1.541667 Cheers, Alan -- Email: aghay...@gmail.com Mobile: +41763389128 Skype: aghaynes [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] using Pearson's Chi-squared to verify dependence among species distribuion
On 01/05/2013, at 06:09 AM, Antonio Silva wrote: Thanks Zoltan R R-sig list members I still with some doubts. Is there a way to calculate in R the results for the equation X2= n*[|ad-bc|-(p/2)]^2 / [(a+b)*(c+d)*(a+c)*(b+d)] (see Legendre Legendre pg 295 eq. 7.6) This should be doable in vegan::designdist() function. Set 'abcd = TRUE' and you can directly use terms like you defined them above. I won't give the equation here as I don't know what is the equation you used: Your previous message had a different equation than this one here with some mix up with 'n' and 'p'. The designdist() function calculates (dis)similarities between rows. You must transpose (t()) your data if you want to have dissimilarities between columns. The code is pure R so that you can see how to do these calculations yourself. Cheers, Jari Oksanen for the following data: ST01 ST02 ST03 ST04 ST05 ST06 ST07 ST08 ST09 SP1 1 1 1 1 1 0 0 1 1 SP2 1 1 0 1 0 1 0 1 0 We have a=4, b=3, c=1, d=1, N=9 and the 2x2 table for this example is SP2 Presence Absent SP1 Presence 4 3 Absent 1 1 Following the Chi square formula I got 110.25 / 280 = 0.3937 I have tried many things, without success. Thanks in advance for any suggestion. Antonio Olinto 2013/4/25 Zoltan Botta-Dukat botta-dukat.zol...@okologia.mta.hu Dear Antonio, Try this: chisq.test(table(sp1,sp2)) Best wishes Zoltan On Wed, Apr 24, 2013 at 6:02 PM, Antonio Silva aolinto@gmail.com wrote: Hi, I'm trying to use Pearson's Chi-squared to verify the dependence among species distribuion. I have a dataframe with the presence/absence data of two species in a number of sample units The equation I'm using is: X2= p*(|ad-bc|-p/2)^2 / ((a+b)*(c+d)*(a+c)*(b+d)) where a is the number of double presence (1-1), b is the number of 1-0, c is the number of 0-1 and d is the number of 0,0 (double absence) p is a+b+c+d Is there a function to caltulate it using R? I could not understand how to use chisq.test function for this. Thanks in advance. Antonio Olinto -- Zoltán BOTTA-Dukát Institute of Ecology and Botany Hungarian Academy of Sciences Centre for Ecological Research H-2163 Vácrátót, Alkomány u. 2-4. HUNGARY Phone: +36 28 360122/157 Fax..: +36 28 360110 botta-dukat.zol...@okologia.mta.hu www.okologia.mta.hu [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] envfit and NMDS
Howdy folks, This is the second time this week we have this issue. There are two (or three) separate points: (1) You should not correlate environmental variables with axes in any ordination method. This applies to PCoA, PCA, CA or anything else just as well as to NMDS. You can see this by fitting the vectors: they are rarely parallel to the axes. Even in CCA/RDA, the vectors for constraints are rarely parallel to the axes. (2) The ordination space in NMDS is metric. The non-metric part is the monotonic (non-metric) regression from metric ordination space to observed dissimilarities. The observed dissimilarities between sampling units (plots, sites) need not be metric, but they can be semimetric or non-metric, but the ordination space derived from them is metric. (3) As a separate issue, it is often better to use fitted surfaces than fitted vectors. Fitted vectors are appropriate when the fitted surface is a plane (first degree linear trend surface). This is rarely the case, and this applies to all ordination methods: the fitted surfaces in CA, PCoA or PCA are usually just as non-planar as in NMDS. For point 2: Look at the stressplot(NMDS-result). Here horizontal axis gives Euclidean distances in NMDS space -- these are metric. The vertical axis gives the observed dissimilarities -- these can be anything. The fit lines gives the monotonic regression -- this is non-metric. With vegan::metaMDS() the ordination space is not only metric, but it is strictly Euclidean. We do and we can rotate the ordination space. As a historic note, the vector fitting code for vegan was based on a Bell Labs document that describes vector fitting for their NMDS (KYST software). The Bell folks invented NMDS, and they regarded vector fitting suitable for NMDS from the very beginning. That is, form 1960s. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Erin Nuccio [enuc...@gmail.com] Sent: 24 April 2013 12:30 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] envfit and NMDS Hello list, I commonly see envfit used for NMDS, and am curious if envfit is considered a non-metric vector fitting tool. This question came up during a conversation with a colleague who only uses envfit with PCoA, because they are concerned that to do this would be problematic for the same reason you are not supposed to correlate environmental variables with NMDS axes (you can't correlate something that's non-metric with a metric variable). To me, it seems like by projecting the metric variable into non-metric space, you're essentially making it non-metric, and the correlation would be fine. If anyone could weigh in and clear up the confusion, that would be great. Thanks, Erin ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf
I also suggest (like I have suggested before) that you run metaMDS with argument plot = TRUE. The convergence criteria in metaMDS are pretty stringent, but with plot argument you can see how different the solutions are. Two most typical non-convergence cases are that (1) most points are stable, but there are a some outliers that don't find their place in this universe, and (2) your data need more dimensions and you should increase 'k'. Then you should also check the stressplot( ). If the fit line shoots right up at the maximum observed dissimilarity, you may need to turn on 'noshare' argument in metaMDS to trigger step across dissimilarities. We claim that this rarely necessary with the monoMDS engine we use currently, but sometimes it is needed. Without hands on your data it is difficult to guess more. Cheers, Jari Oksanen Sent from my iPad On 22.4.2013, at 22.31, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: I would say that it *is* important, in general. However, you don't say if you retried running `monoMDS` on the Hellinger transformed data (without the Bray-Curtis metric - you should use Euclidean with Hellinger transformation)? If you didn't try rerunning with out Bray-Curtis and see if it converges. Otherwise, try many more iterations and get vegan to start monoMDS from the best solution from the first set of runs. See `?metaMDS for details. G On Mon, 2013-04-22 at 08:26 +, Aurélie Boissezon wrote: Hello everybody! I didn't imagine that my questions will lead to such a debate among researchers :) . It helps me to get ready for future reviewers' comments. ;) Just a question still opened about NMDS (Gavin?): Is it important to reach a convergent solution? since the best solution ordinate species always in similar way? Because as I said even with stricter criteria the analysis don't reach a convergent solution. Best regards, Aurélie --- Aurélie Rey-Boissezon Ph-D Student University of Geneva Section of Earth and Environmental Sciences - Institute F.-A. Forel Aquatic Ecology Group Uni Rondeau Site de Battelle - Bâtiment D 7, route de Drize - 1227 Carouge Geneva Switzerland Tel. 0041 (0) 22379 04 88 aurelie.boisse...@unige.ch http://leba.unige.ch/team/aboissezon.html De : fgill...@gmail.com [fgill...@gmail.com] de la part de François Gillet [francois.gil...@univ-fcomte.fr] Date d'envoi : samedi 20 avril 2013 10:59 À : Gavin Simpson Cc: Aurélie Boissezon; r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf 2013/4/19 Gavin Simpson gavin.simp...@ucl.ac.ukmailto:gavin.simp...@ucl.ac.uk I really don't see why this has to be an either/or situation. I fully agree: direct and indirect gradient analyses are complementary! Sorry for not having stressed that in my short answers... François -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] CCA vs NMDS and ordisurf
Hello folks, Only one point here: On 18/04/2013, at 15:52 PM, Pierre THIRIET wrote: Pay attention in using NMDS. As you said, it is rank-based, this is why fitting environmental vectors to NMDS biplot is not so appropriate, despite widely done. I don't see the problem about ordisurf and PCoA or CAP: Ordisurf enables you to fit environnemental variables that have non-linear relationships with PC of distance based ordinations. This is not true. I have seen this sometimes in Internet, but this really is not true: The NMDS ordination space is strictly *metric*. In vegan it is even strictly *Euclidean*. So it is absolutely correct to fit vectors to NMDS ordination. (In MASS::isoMDS you can also have any Minkowski metric, but only Euclidean or Minkowski with exponent=2 is allowed in vegan even with isoMDS.) What is non-metric is the monotonic regression from *metric* ordination to any dissimilarity measure. So NMDS finds metric solution from any dissimilarity measure. I -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Conversion of lat/long coordinates to ETRS89
Daniel, On 20/03/2013, at 10:45 AM, Levy, Yair wrote: Dear Mauricio, I would confirm the use of rgdal::spTransform as appropriate. Also, one could add creating a CRS database as follows: #European Petroleum Survey Group list creation EPSG - make_EPSG() The geographical ETRS89 CRS is then the following I believe: #ETRS89 (= EUREF89 = GRS80) CRS(+init=epsg:4258) Actually ETRS89 / ETRS-TM35FIN for Finland is epsg code 3067. Applying spTransform following this function's instructions shouldn't be a challenge I believe. I would strongly recommend comparing your new geographical/cartographical projections the first time you use this function in order to validate their accuracies. The following should work: library(rgdal) xy - data.frame(N = 60, E = 27) coordinates(xy) - ~ E + N proj4string(xy) - CRS(+proj=longlat) spTransform(xy, CRS(+init=epsg:3067)) and it is really useful to validate the results... Cheers, Jari Oksanen (Finland) -Oorspronkelijk bericht- Van: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] Namens Mauricio Zambrano-Bigiarini Verzonden: mercredi 20 mars 2013 09:29 Aan: r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Conversion of lat/long coordinates to ETRS89 On 20/03/13 08:33, Daniel Flø wrote: Hi, We have several lat/long coordinates for Finland that we would like to convert into a selected projection (such as ETRS89/ETRS-TM35FIN or other). Could someone inform us about what package and function we should use to batch converte, and details in the procedure we should be aware of? ?rgdal::spTransform r-sig-geo is a better place for more specific questions. Kind regards, Mauricio Zambrano-Bigiarini -- = Water Resources Unit Institute for Environment and Sustainability (IES) Joint Research Centre (JRC), European Commission TP 261, Via Enrico Fermi 2749, 21027 Ispra (VA), IT webinfo: http://floods.jrc.ec.europa.eu/ = DISCLAIMER: The views expressed are purely those of the writer and may not in any circumstances be regarded as sta- ting an official position of the European Commission = Linux user #454569 -- Ubuntu user #17469 = It is not enough to have knowledge; one must also apply it (Goethe) We do not have additional information to the lat/long coordinates, but this is may be not necessary? Daniel Flø PhD Fellow, Forest Health The Norwegian Forest and Landscape Institute Pb 115, NO-1431 Ås Office: (+47) 64 94 90 28 Mobile: (+47) 91 14 52 74 Skype:floe.daniel - www.skogoglandskap.nohttp://www.skogoglandskap.no/ - [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] nested.npmanova -- distance matrices as input?
Erin, R is open source: you can see the source code if in doubt. Looking at the source code, the situation is a bit unclear and depends on the details of your data that I don't know. It seems that nested.npmanova() accepst R distances structures of class dist. If your Unifrac distances inherit from dist class, nested.npmanova(), they will be handled correctly in nested.npmanova. If they are, like your write, distance matrices, then they will be handled incorrectly: they are accepted silently but treated like they were raw data matrices. You should get an error message from vegdist() in that case, as it knows no method = FALSE, so it may be that your dissimilarities were correctly handled. However, we don't know as we even do not know what software (R package, external software) you used in calculating unifrac dissimilarities. If 'd' are your distances, see what does class(d) say. If says dist (possibly with some other alternatives), you are safe. If your 'd' are not of class dist, you can try if as.dist(d) changes them to dist. Cheers, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Erin Nuccio [enuc...@gmail.com] Sent: 11 March 2013 07:08 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] nested.npmanova -- distance matrices as input? Hello list, Does anyone know nested.npmanova can take distance matrices as input? When I read the helpfile, it specifies that the input for nested.npmanova is a data frame, and sounds like distance matrices can only be used for nested.anova.dbrda (see below). However, if I try inputting a Unifrac distance matrix, and make method = FALSE, it completes without any errors. Did this complete correctly? formulaFormula with a community data frame (with sites as rows, species as columns and species abundance as cell values) or (for nested.anova.dbrda only) distance matrix on the left-hand side and two categorical variables on the right-hand side (with the second variable assumed to be nested within the first). Thank you, Erin ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] question for the R community : Plot RDA biplot without axis ?
Sarah, I added argument 'axis.bp' to text.cca and points.cca functions. To upgraded functions can be found in http://vegan.r-forge.r-project.org/ (rev2452), and will probably be included in the next minor release of vegan (2.0-7) scheduled for March, 2013. You can also get the single files from R-Forge, or you can install devel version of vegan with install.packages(vegan, repos=http://r-forge.r-project.org;) It will take a day at minimum to get the version packaged in R-Forge. Cheers, Jari Oksanen On 26 Feb 2013, at 3:49, Sarah Loboda wrote: Hi, Here's the reproducible example that I made with dune data. When you do the 4 graphs, you can see that because of the text () function, there is an axis on the right and values appear in the plots on the right side. I understand that it is because of my text () function, but is there a way to delete that axis in the text funtion? if not, is there another way to plot my data on 4 panels without axis? I don't know what you mean by body of vegan text.cca. You mean in the vegan tutorial ? I used col.axis because ann=FALSE as an argument in plot function does not work and col.axis seems fine... Thank you very much for your time. I really appreciate your help :D library(vegan) library(MASS) ### data data(dune) data(dune.env) ### Constrained ordination dune.hel-decostand(dune, hellinger) dune.cca - cca(dune ~ A1 + Manure, data=dune.env) ### Plot with 4 panels par(mfrow=c(2,2)) par(mar=c(0.3,0.3,0.3,0.3)) ### plot 1 plot(dune.cca, type = n, scaling = 2, col.axis=white) with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3, col=2)) ### When I add the next line, it adds env. variables as arrows but also adds an axis on the right text(dune.cca, display=bp, col=1, cex=1.1) ###plot 2 plot(dune.cca, type = n, scaling = 2, col.axis=white, col=grey) with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3, col=2)) text(dune.cca, display=bp, col=1, cex=1.1) ###plot 3 plot(dune.cca, type = n, scaling = 2, col.axis=white) with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3, col=2)) text(dune.cca, display=bp, col=1, cex=1.1) ###plot 4 plot(dune.cca, type = n, scaling = 2, col.axis=white) with(dune.env, points(dune.cca, display = sites, scaling = 2, cex=1.3, col=2)) text(dune.cca, display=bp, col=1, cex=1.1) 2013/2/25 Gavin Simpson gavin.simp...@ucl.ac.uk On Mon, 2013-02-25 at 13:18 -0500, Sarah Loboda wrote: Hi, I have trouble to obtain the ordination graph I want. I want to have 4 RDA biplot on the same page and I don't want to have (or I want to modify) the axis numbers. I want the marks on the axis without numbers to maximize the space for each RDA plot. A problem is the call to text() ( which calls text.cca() ). It doesn't pass on arguments to the underlying axis() calls and hence you can't do what you are trying to do with that function directly. Not sure why you want the axis to be white - that draws an axis so it will obscure anything drawn before it with white paint. The only solution at the moment will be to modify the vegan:::text.cca() function to change the two calls to axis() at the end of the function definition. I suspect you could just copy the body of vegan:::text.cca and put it into your own function, but I haven't tried it. If that fails due to namespace issues, then use assignInNamespace() to assign your function to the text.cca function in vegans namespace. See the relevant help pages on how to do this. I'm about to leave the office so I can't help further now, but if you have trouble email back to the list and I'll see about cooking up and example... All the best Gavin This seems like a simple task but I tried different approaches and coudn't figure out how to change my axis. This is my code : par(mfrow=c(2,2)) par(mar=c(0.2,0.2,0.2,0.2)) ### first RDA biplot with(arctic, levels(site)) shapevec- c(19,19,19,19,19,19,19,19,19,19,19,19,6,6,6,6,6,6,6,6,6,6,6,6) plot(spiders.rda.a, type = n, scaling = 2, las=1, tcl=0.2, col.axis=white) with(spiders.env.a, points(spiders.rda.a, display = sites, scaling = 2, pch = shapevec, cex=1.3)) text(spiders.rda.a, display=bp, cex=1.1, col.axis=white, ann=FALSE) it is when I run this line that my y axis appear on the right but I don't want to ### in yellow, this is what I tried to make it diseappear. To put those arguments in plot() doesn't change anything. What should I had in the text part to make sure that the axis doesn't show up? My intention is to plot my sites as dots without text and my arrows for environmental variables with the name of each variable. Any other ideas on rda plot will be greatly appreciated. Thank you very much :D -- Sarah Loboda MSc candidate, Entomology, Department of Natural Resource Sciences McGill University *http://insectecology.mcgill.ca
Re: [R-sig-eco] Issue with BiodiversityR::nested.npmanova
Kay, This should not work if the function is correctly written. You say that the terms in your formula are in data=env, but there are no variables env$NFac and env$Fac in env -- but there are NFac and Fac. Cheers, Jari Sent from my iPad On 23.2.2013, at 12.15, Kay Cichini kay.cich...@gmail.com wrote: Hi all, Does anyone have a clue why this is not working: library(BiodiversityR) library(vegan) data(mite) mite - mite[1:60,] env - data.frame(Fac = rep(LETTERS[1:2], each = 30), NFac = rep(letters[1:4], each = 15)) nested.npmanova(mite ~ env$Fac + env$NFac, data = env, method = jac, permutations = 1000) Regards, Kay -- Kay Cichini, MSc Biol Grubenweg 22, 6071 Aldrans Tel.: 0650 9359101 E-Mail: kay.cich...@gmail.com Web: www.theBioBucket.blogspot.co.athttp://www.thebiobucket.blogspot.co.at/http://www.theBioBucket.blogspot.co.at -- [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] adonis and temporal changes
On 18/02/2013, at 14:04 PM, Pierre THIRIET wrote: Dear Valérie, If I remember well, your design includes: Isolation categories: 3 levels Sites: nested within Isolation categories (10 levels, a total of 30 sites) How many replicates per site and time? Time:? how many years you have? Only one sampling per year? Within sites and years, samples were random or it is always exactly the same area you sample (e.g. permanent quadrats)? for adonis, consider that strata is for constraining permutations, which is different than terms in the formulae. Exactly. The 'strata' only influence the permutations and have no effect in formula nor effect defined in the formula. Currently the 'strata' are the only way to constrain the permutations. However, in the R-Forge version of vegan and in vegan 2.2-0 (to be released in April) you can give a permutation matrix as an input to adonis. You can generate the permutation matrix with, say, shuffleSet function of the permute package. This allows generation of restricted permutations for instance for time series. Vegan command vegandocs(permutations) will open up the vignette of the permute package for your inspection, and this will give some examples of defining restricted permutations. At some timeframe we are completely moving to the permute package, but you can already use its permutation matrices as input with these new and upcoming versions of vegan from R-Forge. Cheers, Jari -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Fwd: vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande
Dear Carolina Bello, You asked this same thing in the general R mailing list, and Brian Ripley answered to you on Saturday. The essential things he told you were that you cannot do that with 32G of RAM, and that you should rethink your problem. All we can do here is to repeat his message, and Tom Philippi already did so. With N = 138037 you need 71G to store the result, and 32 G of RAM is too little. I don't know how much further you would get with vegan:::vegdist in R 3.0.0 but at least the error message will change to a Spanish version of Error: cannot allocate vector of size 71.0 Gb. You really should re-think your problem. You need to use methods that can handle large data sets like that or you need to thin your data. Your data are modelled? At least I find it difficult to believe that you really have observations on 89 species in 138037 grid cells in rugged terrain like the Andes. Cheers, Jari Oksanen On 12/02/2013, at 00:15 AM, Carolina Bello wrote: Hi I have some problems with the vegdist function.I want to do a hierarchical cluster from 138037 pixels of 1 lkm^2 from a study area of colombian Andes. I have distributions models for 89 species so i have a matrix with the pixels in the rows and species in the columns and is full with absence(0)/presence(1) of each species per each pixel. I think the bigger problem is that for agglomeration method in the hierarchical cluster i need the hole matrix so i can´t divided it. For doing this I want to calculate a distance matrix with jaccard. I have binary data. The problem is that i have a matrix of 138037 rows (sites) and 89 columns (species). my script is: rm(list=ls(all=T)) gc() ##para borrar todo lo que quede oculto en memoria memory.limit(size = 10) # it gives 1 Tera from HDD in case ram memory is over DF=as.data.frame(MODELOS) DF=na.omit(DF) DISTAN=vegdist(DF[,2:ncol(DF)],jaccard) Almost immediately IT produces the error:* Error en double(N * (N - 1)/2) : tamaño del vector especificado es muy grande* I think this a memory error, but i don´t know why if i have a pc with 32GB of ram and 1 Tera of HDD. I also try to do a dist matrix whit the function dist from package proxy, i did: library(proxy) vector=dist(DF, method = Jaccard) it starts to run but when it gets to 10 GB of ram, a window announces that R committed an error and it will close, so it closes and start a new section. I really don't know what is going on and less how to solve this, can anybody help me? thanks Carolina Bello IAVH-COLOMBIA -- View this message in context: http://r.789695.n4.nabble.com/vegdist-Error-en-double-N-N-1-2-tama-o-del-vector-especificado-es-muy-grande-tp4658010.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Adonis and Random Effects
On 04/02/2013, at 09:14 AM, Erin Nuccio wrote: Is adonis capable of modeling random effects? No, adonis does not know random effects. The significance tests in adonis() are based on permutations. The key question for random effects is: can you design a permutation matrix that treats some variables like they were random and others like they were fixed. If your answer is yes, you can have random effects, and if you use the R-Forge versions of vegan (now 2.1-25), you can input your own permutation matrices. If your answer is I don't know, then there is no way of handling random effects. You cannot expect lme4 extensions to formula to work in other packages. We can parse many fixed effect formulae to a model matrix, but there is a long way of getting permutations to work in some specific way for variables tagged as random factors. Cheers, Jari Oksanen I'm analyzing the impact of a treatment on the microbial community in a split-plot design (2 treatments per plot, 4 plots per grassland, 3 grasslands total). I would like to quantify how much of the variance is due to the Treatment versus the Grassland. It seems like Grassland should be a random effect, since there are thousands of grasslands, and I'm only looking at 3. I have tried to use the notation that works with lme4, and it's not working for me (see below for formula and error messages). If adonis can't do random effects, are there any alternatives? Or, considering my goal, are there any other programs I should look into? Any suggestions would be highly appreciated! -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Query as db-RDA
On 22/01/2013, at 08:09 AM, ecology_questi...@yahoo.co.jp wrote: I have been carried out a ordination programm, which is db-RDA based on package vagan. I could show the ordination figure. However, I do not find how X and Y values of the respective plots and of arrows of the respective environmental variables were taken. Please teach me the way(s) and programm(s) to get the X and Y values. Hello, They are extracted from the result object using scores() function. The scores() function is described together with the plot() function (see ?plot.cca, ?scores.cca). The raw scores can be found from the result object (see ?cca.object), but scores() also scales results as requested (argument 'scaling'). HTH, オクサネ槍 -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
On 16/01/2013, at 13:50 PM, Duarte Viana wrote: Hi Jesse and Gavin, I remember that post and I think that turning the argument LINPACK of the function svd on solved the problem. I did so by: 1. pasting the code of function svd (that can be found in the source code of the base package) on the console 2. changing the argument LINPACK=FALSE to LINPACK=TRUE 3. re-assigning the function to the base package with the code: assignInNamespace(svd, svd, base) However I do not know whether this is the best solution. Perhaps Gavin can find out what the real problem is. I would appreciate if you both could keep me informed about any progress. We have had this problem sometimes in svd in vegan functions (rda, cca), and then LINPACK = TRUE has helped. However, this is not a good solution as LINPACK is phasing out from R. In the current development version (and probably in the next R release later this year), LINPACK is defunct, and this solution won't work. It seems that the basic problem is in the svd function of the LAPACK that is used as the default now and that will be the only choice later this year. LAPACK is an external library that is beyond the control of package developers and R core team so that these problems may be unsolvable. It seems that the problems with the LAPACK code are so common that even the help page of svd() function tells about them: Unsuccessful results from the underlying LAPACK code will result in an error giving a positive error code (most often ‘1’): these can only be interpreted by detailed study of the FORTRAN code but mean that the algorithm failed to converge. Sometimes rescaling of the data (= multiplying with a suitable constant) has helped, too, and that may be the only thing we can try once LINPACK = TRUE option disappears within a few months. Best wishes, Jari Oksanen On Wed, Jan 16, 2013 at 11:35 AM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: Hi Jesse, Can you send me the data *and* *exact* code you used so I can look into this further? I promise to delete the data once I have gotten to the bottom of the problem. If you can, please do so *off list*. If you can't then it might help to scale the numbers a bit as range of 5 orders of magnitude may be causing some numerical issues with your data. Note this has nothing to do with vegan; cocorresp is a separate package. Re the last question; it is possible and IIRC there is some Matlab code to do some of this in the supplementary materials for the ter braak schaffers paper. I got some way to implementing this in R but finishing it off went to the back burner and I never get back to it since. All the best, Gavin On Tue, 2013-01-15 at 19:09 -0800, Jesse_B wrote: Question 1 - It's been a while, so I don't know who will see this, but I am having the same issue. I have count data from two species matrices (fish and inverts) and I am trying to run CoCA through cocorresp. Symmetric CoCA works fine, and is the main thing that I need, but I would like to be able to switch predictor-response species matrices in a predictive CoCA, to see if there are differing patterns of top-down/bottom-up concordance. I have substantial skew in the data, so I have log+1 transformed both sets of data. I can run crossval on the raw data (not transformed, 99 samples [33 sites over 3 seasons], 72 fish species, 226 invert species, individual numbers are in the same ranges, between 0 and 10K for both fish and inverts), but on the transformed data, I get the Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd' message consistently on the 5th site. I am comfortable _using_ R and the vegan package in particular, but I am not experienced in more deep coding, so I don't have a handle on how to turn LINPACK on. R version 2.15.2, vegan 2.0-4, cocorresp 0.2-0 crossval(fishlog,invertlog) LOO - Site: 1 - Complete LOO - Site: 2 - Complete LOO - Site: 3 - Complete LOO - Site: 4 - Complete LOO - Site: 5Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd' Question 2 - Is it possible to run crossval on matricies for a CCA? to make it a PLS-CCA (as in Schaffers et al. 2008) or am I misunderstanding the process that they used? Thanks in advance! Jesse -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/Error-in-La-svd-x-nu-nv-error-code-1-from-Lapack-routine-dgesdd-tp7556369p7577802.html Sent from the r-sig-ecology mailing list archive at Nabble.com. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street
Re: [R-sig-eco] pca or nmds (with which normalization and distance ) for abundance data ?
Claire, Here some small comments On 13/12/2012, at 17:24 PM, claire della vedova wrote: Dear all, a) Which ordination method would be better for my data : PCA knowing that the represented inertia is 35.62% or NMDS with a stress value about 0.22? These numbers cannot be used to say which of these methods is better. You need other criteria. Some people may have strong opinions on the choice here, but these opinions cannot be based on these numbers -- they are based on something else (I do have such an opinion, but I abstain from expressing my opinion). b) If NMDS is more adapted which one is the better? with Hellinger normalization and Bray-Curtis distance, or with the normalization recommended by Legendre and Legendre and Kulcynski distance ? Hellinger transformation was suggested for Euclidean metric, and normally it is used in PCA/RDA (which are based on Euclidean metric although they do not explicitly calculate Euclidean distances). I haven't heard of any advantages of Hellinger transformation with Bray-Curtis dissimilarity. I suggest you don't use it with Bray-Curtis. I don't know if Kulczyński dissimilarity is any better than, say, Sørensen dissimilarity (and both seem to be difficult to spell), but certainly it belongs to the same group of usually well behaving dissimilarities as variants of Bray-Curtis or Jaccard. c) Is there other method to apply? I’m going to try co-inertia with ade4 package Certainly there is a high number of methods you can apply, but why? What you try to analyse? What are your questions? Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Vegan metaMDS: unusual first run stress values with large data set
Hello R-Community, First my thanks to Ewan Isherwood who turned our attention to this issue and sent his data file to us for analysing the situation. It seems that the default convergence criteria are too slack in monoMDS() that was the ordination engine of metaMDS() in this case. Good news are that you can change those criteria by adding argument 'sfgrmin' to the metaMDS() call (this is documented in ?monoMDS). The following command seems to work: PSU.NMDS - metaMDS(PSU.sp, k=2, sfgrmin = 1e-7, distance = jaccard) The default was 'sfgrmin = 1e-5' which was so slack the iteration stopped early and did not really converge close to the solution. With this option you can find that the correct stress is of magnitude 0.029 which is much lower than reported below. Moreover, the stresses of one-dimensional and two-dimensional solutions are very close to each other. (There was one outlier (P1763E) which only had one species (CHICRA) that occurred only in four other sites and distorted the results.) I advice *against* using 'zerodist = add': it is not needed with monoMDS. Identical (distance = 0) sites will have identical scores if you do not use this argument. Using 'zerodist = add' is only necessary with MASS::isoMDS() that is unable to handle zero distances. We have changed the default of 'sfgrmin' in http://www.r-forge.r-project.org/ so that you should not see this problem in the next vegan releases. Cheers, Jari Oksanen On 05/12/2012, at 21:15 PM, Ewan Isherwood wrote: Hello, R-Community! This is the first time writing to this group and indeed the first time using a mailing list, so please bear with me if I’ve done something wrong. I have a large species x site matrix (89 x 4831) that I want to ordinate using metaMDS in the Vegan (2.0-5) package in R (2.15.2). If I run this data frame using the Jaccard index in two or more dimensions (k1), the first run (run=0) has a relatively low stress value and the other 20 runs are much higher and have very low deviation. However, k=1 seems to work fine. Furthermore, a stress/scree plot reveals a pyramid-like shape, where the k=1 lowest stress value is low, increases rapidly for k=2 then decreases slowly as k increases. DimensionsStress 1 0.1382185 2 0.1939509 3 0.1695375 4 0.155221 5 0.1406408 6 0.1294149 I’ve tried this with a small iteration of this data and this issue arises at k2 rather than at k1 as it is here. Anyway, this is the input and output: library(vegan) library(MASS) PSU - read.table(PSU.txt, header = TRUE, sep = ) PSU.sp - PSU[, 22:110] PSU.NMDS - metaMDS(PSU.sp, k=4, zerodist = add, distance = jaccard) Square root transformation Wisconsin double standardization Zero dissimilarities changed into 0.0006657301 Run 0 stress 0.155221 Run 1 stress 0.2548103 Run 2 stress 0.255434 Run 3 stress 0.2551382 … (Up to run 20 where run 1 through run 20 have all very similar stress values.) Call: metaMDS(comm = PSU.sp, distance = jaccard, k = 4, zerodist = add) global Multidimensional Scaling using monoMDS Data: wisconsin(sqrt(PSU.sp)) Distance: jaccard Dimensions: 4 Stress: 0.155221 Stress type 1, weak ties No convergent solutions - best solution after 20 tries Scaling: centring, PC rotation, halfchange scaling Species: expanded scores based on ‘wisconsin(sqrt(PSU.sp))’ Now, again, with k=1 this does not happen – the solution looks like any other regular NMDS run. There are no blank values in the data as they are all numbers between 0 and 100 corresponding to % cover, and every row and column sum is greater than 0. There are many sites with the same species configurations, hence the zerodist, but omitting this makes no difference to the problem at hand. The NMDS works fine if I use a subset of the data, but I have not subsetted and tested all of it. Other metric (Euclidean) and nonmetric (Bray) dissimilarity indices result in the same effect. I’ve chosen k=4 here because of the (marginal) elbow in the stress plot, but the data itself actually looks pretty good at any k value. Even though the output is reasonable, I am concerned that hitting the best solution by a large amount on the first run means something is messing up, and this concern is amplified by the strange pyramid shaped stress plot. Because metaMDS uses random starts, I don't see how this output is possible. I've scoured the help files and archives of this list and I am really now at a loss to explain this. Thank you in advance for your time and consideration! Ewan ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org
Re: [R-sig-eco] summing F stats and permutation
Steve, The code gives you a vector of F-values calculated separately for each species. Taking sum of those vectors gives you the sum of all F-values. For the ordinary F-values (sum numerators and denominators and get their ratio) you can use anova or pemutest functions in vegan. Cheers, Jari On 29 Nov 2012, at 18:41, Steve Brewer wrote: Jari, Thanks. This is helpful. I knew that RDA obtained predicted abundances for each species, but I didn't know that function rda could be used to calculate F statistics for each predictor for each species. That does seem like a promising way to approach the problem. So the code you've sent me is showing how to extract and sums the numerator and denominator SS and then calculate the ratio from those sums and the dfs? Or is it showing how to obtain Fs for each species and then summing them? Incidentally, I was initially very skeptical of the sumF approach until I started comparing sumF results to perMANOVA results (using PC-Ord) on several different datasets of mine and got strikingly similar results between the two techniques. It seems to go against what a lot of statisticians are saying about the unique value of multivariate stats. I don't have a satisfactory explanation for why it seems to work with my data. Steve J. Stephen Brewer Professor Department of Biology PO Box 1848 University of Mississippi University, Mississippi 38677-1848 Brewer web page - http://home.olemiss.edu/~jbrewer/ FAX - 662-915-5144 Phone - 662-915-1077 On 11/29/12 9:43 AM, Jari Oksanen jari.oksa...@oulu.fi wrote: Steve, This is R, so it is not about whether this can be done, but how this can be done. Unfortunately, doing exactly this requires some fluency in R. Doing something similar is very simple. The description of your problem sounds very much like the description of permutation test in redundancy analysis (RDA). The difference is that in RDA you sum up nominators and denominators before getting the ratio, but in your model you sum up the ratios. So in RDA test you have (num_1 + num_2 + ... + num_p)/(den_1 + den_2 + ... + den_p), and in your description you have num_1/den_1 + num_2/den_2 + ... + num_p/den_p. The former test in canned for instance in the vegan package, but the latter you must develop yourself (and the former method of summing variances instead of their ratios feels sounder). It would not be too many lines of code to fit your code, though. Please note that RDA works by fitting linear models for each species independently so that you can get all needed statistics from a fitted RDA in the vegan package (function rda). The following function extracts F-values by species from a fitted vegan:::rda() result object: spF - function (object) { inherits(object, rda) || stop(needs an rda() result object) df1 - object$CCA$qrank df2 - nrow(object$CA$Xbar) - df1 - 1 num - colSums(predict(object, type=working, model=CCA)^2) den - colSums(predict(object, type=working, model=CA)^2) (num/df1)/(den/df2) } HTH, Jari Oksanen From: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] on behalf of Steve Brewer [jbre...@olemiss.edu] Sent: 29 November 2012 16:42 To: r-sig-ecology@r-project.org Subject: [R-sig-eco] summing F stats and permutation Dear Colleagues, I'm wondering if anyone in this group has developed code for doing a sumF test for examining community responses in an experiment. For those not familiar, sumF is a simple univariate alternative to MANOVA and perMANOVA, wherein univariate ANOVAs and their associated F statistics are calculated for each species' abundance and then the F statistic for each effect is summed over all species. The significance of the resulting summed F statistic is then evaluated using random permutation. The summed F statistic is interpreted as an overall community response to the treatments, whereas the F statistic for each species provides a measure of the contribution that species makes to treatment differences. I could envision a variety of ways in which this could be done in R, but I'm not adept enough in R to figure out how to do it myself. One possibility might involve using permute or shuffle to get the randomized data matrices, but it is not clear to me how one could simultaneously calculate the Anovas for all species and sum the resulting F statistics for each random permutation. There is no reason why traditional F statistics would have to be used. Pseudo-F statistics based on distances for each species' abundance could be calculated instead and then summed across species. PLEASE NOTE THAT I AM ALREADY AWARE OF THE OBJECTIONS TO THIS APPROACH TO COMMUNITY ANALYSIS. Nevertheless, I am interested in pursuing this using R, if possible. Any suggestions are welcomed. Thanks, Steve J. Stephen Brewer Professor Department of Biology PO Box 1848
Re: [R-sig-eco] post-hoc test for adonis PERMANOVA
There is a parametric post hoc test for betadisper(), but there is no post hoc test for adonis(). These two functions are different and study different null models. If anybody figures out how to do post hoc test for adonis() we may incorporate submitted code into vegan (though personally I think that the whole idea of post hoc tests is weird, to put it mildly). Cheers, Jari Oksanen Sent from my iPad On 25.11.2012, at 0.54, Andre Rovai asro...@gmail.com wrote: Dear friends, I have been recently introduced to R and am dedicating myself to go deeper on it. It can be tricky sometimes for a rookie, though! I ran a PERMANOVA (vegan package) using the adonis() function and tried to go deeper in the analysis by running a post-hoc test (betadisper()). I do appreciate if someone more familiarized with codes related to the post-hoc analysis could validate the way I am headed to. Here are the codes I used: # permanova fl.sem.transf.bray - vegdist(fl, method=bray) flfat - read.table(flfator2.txt, header=T, row.names=1) adonis(fl.sem.transf.bray ~ loc*trat, data=flfat, strata=flfat$trat, permutations=999) # there was significant difference for the interaction # post-hoc test phoc - with(flfat, betadisper(fl.sem.transf.bray, trat)) TukeyHSD(phoc) Thank you! Andre Rovai [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] ANOSIM
On 23/11/2012, at 18:44 PM, aristide.andrianarim...@blueline.mg aristide.andrianarim...@blueline.mg wrote: Dear responsible, I want to know if ANOSIM (Global R, and p-value) are only use for dissimilarity matrix as well as similarity matrix If I use similarity matrix like Jaccard, is still the same interpretation of the output R= completely dissimilar and R = 0 completely similar? Aristide, If your are using the vegan implementation of anosim, the answer is: no, you cannot use anosim() for similarities but only for dissimilarities. If you refer to the *method* (instead of *implementation*), the answer is: yes, of course you can use ANOSIM for similarities. You only need to implement it that way (change the signs, reverse the P-values compared to vegan::anosim()). The vegan implementation is made for R, and R deals with dissimilarities. Some packages may change this R convention, but similarities are not R compliant. Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA based on dissimilarities
Mauricio, Nothing reproducible here (we don't have your data sets), not even the output. In your email you only write about adonis() and the p-values you got from adonis. On the other hand, your TukeyHSD() is based on betadisper(), and you never seem to check the betadisper() p-values. These two methods are radically different: adonis() is about differences between groups, and betadisper() is about dispersions within groups. We expect some kind of consistency between betadisper() p-values and TukeyHSD() of betadisper, but adonis() is quite a different beast which has little to do with betadisper(). If you take an analogy to standard statistics: adonis() is like (M)ANOVA, betadisper() is like Levene's test of homogeneity (what ever that means) of variances, and TukeyHSD() is its sequel about detecting cases with different variances. I suggest you rather use 999 permutations than 1000: you can get nicer round figures. Once we were more patronizing and dropped one permutation if the user asked for even hundred or thousand, but then we thought that this is up to user to decide and none of our business. Cheers, Jari Oksanen On 27/08/2012, at 23:23 PM, Maurício Moraes Zenker wrote: Dear all, I‚m using the function „adonis‰ (multivariate ANOVA based on dissimilarities) to study beta diversity of moths between altitudinal levels and seasons. My analysis is based on the example included in the vegan tutorial „Multivariate Analysis of Ecological Communities in R: vegan tutorial‰, written by Jari Oksanen. As in the example, the analysis of my data indicated that there are significant differences. However, the Tukey‚s test indicated that there are no differences between groups? Why does it happen? Shouldn‚t the Tukey‚s test show that one group is different from another? The script is almost the same of the tutorial but with more permutations. Thank you. Mauricio Permanova-read.table(Permanova.txt,header=T) PermanovaII-read.table(PermanovaII.txt,header=T) betad - betadiver(Permanova, z) adonis(formula = betad ~ Altitude*Season,data = PermanovaII, permutations = 1000) hov - with(PermanovaII, betadisper(betad, Altitude)) TukeyHSD(hov) hovII - with(PermanovaII, betadisper(betad, Season)) TukeyHSD(hovII) [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] IndVal groups Plotting an RDA with labels
minimally edited copied pasted vegan help page (here and all the following code). Don't do this but *follow* the documentation. If you give arguments like 'const', 'select', 'arrow.mul', you should give some value to those, or you get an error. Normally you do not need to give these arguments at all as the functions find defaults internally. You only must give them if you want to change those defaults (like 'select' some other cases than all), and then you must give the new values. # would I put: scores(c1.rda, choices=1:2, scaling=2, display=sp) in for const This question makes no sense. It seems that a good idea might be that you sit down and peacefully go through some simple introductory example (like that in vegandocs(intro)). Of course, there is PC-Ord... Cheers, Jari Oksanen ## S3 method for class 'cca': points(c1.rda, display = sites, choices = c(1, 2), scaling = 2, arrow.mul, head.arrow = 0.05, select, const, ...) ## S3 method for class 'cca': scores(x, choices=c(1,2), display=c(sp,wa,cn), scaling=2, ...) ## S3 method for class 'rda': scores(x, choices=c(1,2), display=c(sp,wa,cn), scaling=2, const, ...) ## S3 method for class 'cca': summary(object, scaling = 2, axes = 6, display = c(sp, wa, lc, bp, cn), digits = max(3, getOption(digits) - 3), ...) ## S3 method for class 'summary.cca': print(x, digits = x$digits, head = NA, tail = head, ...) ## S3 method for class 'summary.cca': head(x, n = 6, tail = 0, ...) ## S3 method for class 'summary.cca': tail(x, n = 6, head = 0, ...) * * * * -- Caitlin Porter MSc candidate Biology Department Saint Mary's University - Halifax Office: S320 (902) 719-4815 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] metaMDS - avoid species overlap in plots
Hello, This so frequently asked that it is a FAQ. Try vegandocs(FAQ) in R and see question 2.18. It is also such a typical problem that it is in vegan intro: try vegandocs(intro) in R after loading vegan and see chapter 2.1. Gavin Simpson has a blog article on plots, but I'm now on my phone and can't check the web easily. HTH, Jari Oksanen alkuperäinen viesti Lähettäjä: Gian Maria Niccolò Benucci Lähetetty: 25.06.2012, 12:40 Vastaanottaja: r-sig-ecology@r-project.org Aihe: [R-sig-eco] metaMDS - avoid species overlap in plots Hi community, Very simple question. How to avoid overlap plot of species in a metaMDS() diagram? I used this command... text(metaMDS_new2, display=c(species), cex=0.6) but some species are plotted one over the other and is not simple to read the diagram. Thanks for helping, -- Gian [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Retrieving canonical coefficients and correlation coefficients according to ter Braak using vegan package cca() function
Joris, On 14/06/2012, at 17:39 PM, Joris Meys wrote: Hi all, I get results that differ from the table given by ter Braak. I can explain the differences in sign (that's normal, as the solution of cca is defined up to the sign), and I can believe there would be small differences as we're talking different ways of estimation, but the canonical coefficients for the second axis differ substantially. My question : - Did I use the correct methods to extract what ter Braak calls canonical coefficients and correlation coefficients? - is there a logical explanation for the big difference in some results ? I haven't checked that paper, and can't do that for several days. However, one essential difference in canonical coefficients between Canoco and vegan::cca is that Canoco standardizes all constraining variables to unit variance whereas vegan uses unstandardized variables. Does it help if you use Model - cca(hs.spec ~ ., data = scale(hs.ev)) ? Cheers, Jari Oksanen -- Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland jari.oksa...@oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology