Re: [R-sig-eco] 'grouping' grouping variable
hi jakub, I would suggest starting with standardizing your environmental variables with scale(), then compute Euclidean distances with e.g. vegdist() in {vegan} and run a cluster analysis on the distance matrix with hclust(). Choose a cutoff for minimum dissimilarity and group your sites accordingly. If you happen to have an idea about the number of groups you expect, then kmeans() may be an alternative. cheers, gabriel On 12/16/11 1:14 AM, Jakub Szymkowiak wrote: Hello, I have a problem and I don't know how can I solve it. I have one grouping variable (16 regions in my country). Every region is described by several environmental variables, in example arable fields area, woodland area or meadows area. I want to group this regions to small number of groups so that, the similar regions (in terms of my environmental variables) will be in the same group. Any cues, how can I solve this? Cheers, Jakub ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] post hoc in Kruskal Wallis
Jakub, Do a pairwise wilcoxon(), then adjust P-values with p.adjust(). This would be the classical frequentist follow-up. cheers, gabriel On 11/23/11 6:21 PM, Jakub Szymkowiak wrote: Hi, does anyone know, how can I perform post-hoc tests (especially Least Significant Difference and Sheffe Test) for results from Kruskal-Wallis test? In KruskaI-Wallis test I found some significant differences between tested groups, but I want to know between which groups this difference is really signifficant. Cheers, Jakub ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Limnology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] interpreting adonis results
... dangerous wording, there could in fact be a location effect of 'location' and/or a dispersion effect of 'location'. Gian, I suggest you add a test of a dispersion effect using the function betadisper(), then you know a bit more about the type of effect. gabriel On 11/16/11 11:02 PM, Gavin Simpson wrote: On Wed, 2011-11-16 at 03:43 +0100, Gian Maria Niccolò Benucci wrote: Hi all, I had 84 samples collected in 7 different sites. In each sample were individuated the different fungal species and recorded. I would test if exist a real difference between the sites and if exist a sort of site effect that structure the fungal communities... Then, I did adonis test adonis(community.sq ~ location, data=env.table, permutations=999) Call: adonis(formula = community.sq ~ location, data = env.table, permutations = 999) Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) location 612.593 2.09886 6.8867 0.34922 0.001 *** Residuals 7723.467 0.30477 0.65078 Total 8336.060 1.0 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 The significance is R2=0.349 at P=0.001 Can I assure that exist a strong site effect in structuring the communities in each site? Depends. The test is one of no effect of `location`. You have found evidence against this hypothesis and thus could reject this hypothesis, instead accepting the alternative hypothesis that there is an effect of `location`. As to the strength of this effect? ~35% of the sums of squares can be explained by `location`. Substantially more of the variance remains unexplained. As I know nothing about your subject area, I am unable to comment further on the strength of the relationship. Seeing as many ecologists whose work I read would say an effect is significant if the p-value was>= 0.05. Not that I subscribe to this way or working, but by that criterion, you have identified a significant `location` effect. HTH G Thanks for helping, G. [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Limnology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Change in rotated NMDS scores as a response variable
hmmm... I think Gavin´s approach definitely has more power, though I don´t quite see why the original idea should not work. Orthogonality is not an implicit feature of an NMDS but it´s also not "prevented"... First, I think quite often NMDS still reproduces/extracts orthogonal features of a dataset. Second, even if NMDS does not care for orthogonality, a "specific" feature of the dataset (say, the "moisture information" in herb data) can behave more or less linearly or at least monotonic in *any* direction on a 2D-plane, in which case the extraction of a rotated axis makes complete sense. However, even in this case an ordisurf fit will greatly help to understand if that´s a legitimate and reasonable approach as I understand. gabriel On 3/10/11 1:04 PM, Gavin Simpson wrote: On Fri, 2011-02-18 at 10:41 -0800, Erik Frenzel wrote: Hello all, I'm interested in adapting a technique from a recent paper Harrison, S., E. I. Damschen and J. B. Grace 2010. Ecological contingency in the effects of climate change on forest herbs. Proceedings of the National Academy of Sciences (USA), 107: 19362-19367. In which a plot's change in NMDS scores over time was used as a response variable: "To measure the overall resemblance of any given herb community to communities found in warm (steep, southerly) versus cool (moderate, northerly) topographic microclimates, we used an ordination approach (also see 28). We ordinated the herb data using NMS ordination in PC-ORD version 4.14 (39), excluding species found in<5% of samples. We rotated axis 1 of the ordination to maximize its correlation with Whittaker’s topographic moisture gradient, so that a low axis 1 score indicated a community in a mesic environment such as a moderate north-facing slope, and a high axis 1 score indicated a community in a warm environment such as a steep south-facing slope. Under a warming climate, we expect the community at any given site to show a higher axis 1 score in 2007–2009 than in 1949–1951, indicating that herb composition has shifted over time in the same direction that composition changes over space from mesic (cooler and moister) to xeric (warmer and drier) topographic microclimates. For each site we calculated the difference between its 1949–1951 and 2007–2009 axis 1 ordination scores. In this case, a high value means a community that has shifted to become more dominated by xeric-adapted species." Jari Oksanen has a post on the the r-forge page (https://r-forge.r-project.org/forum/message.php?msg_id=1311&group_id=68) warning against using rotated NMDS scores in a Structural Equation Model. Are there problems with using a "change in scores" as a response variable in this kind of hypothesis testing? I'm genuinely underwhelmed by this approach. i) there isn't such a thing as nMDS axes so does it make sense to take some 1-d coordinate system out of a 2-d coordinate system and relate it to an external variable? It would be like trying to identify patterns in all the cities of the world on the basis of what line of longitude they happened to lie on. Where this sort of thing does make sense is in methods that do identify orthogonal components from a data matrix such that axis 1 explains a component of the variation in the data, and axis 2 another, different (orthogonal) component of the variation. If this were me, I would have taken the 2-d nMDS configuration and fitted a response surface for Whittaker's topographic moisture into the ordination (using ordisurf) and then take the fitted values of the response surface for each site as the species-related topographic moisture "information", which could be plotted as a function of time. HTH G This was done in PC Ord. Has anyone used "metaMDSrotate" in vegan to do this kind of analysis in R? Does anyone have any examples or code they'd be willing to share or point me to? Thanks, Erik ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Limnology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] cluster defined by environment followed by mrpp
Hi list, Conducting sort of an opinion poll among list members. Start with two matrices, one environmental, one species, same sites. I wondered what people think of defining groups by a cluster analysis based on the environmental variables (say, hclust or similar). Then testing for a difference among those groups with regard to the second species data set (say, adonis or mrpp). I guess the spatial people will feel hurt at least? The strategy seems to be quite common, though. Would be nice to hear opinions. cheers, gabriel ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] is 1 hour long enough to assume independance?
hi chris, I think you are not quite giving us enough information to assess this situation. Otherwise, I´d think that any data coming from ONE dingo (i.e. one radiocollar) will never be independent, the 1 hour is not the problem. Or can you tell otherwise? gab On 7/21/10 3:52 AM, Chris Howden wrote: Morning All, I'm doing a Resource Selection Function Analysis on dingos and we are having a bit of a debate on independence. We're using a landscape unit of 40x40m (from a GIS) and have radio collared data every 1 hour. So we can put a dingo in a specific 40x40 grid very hour. I'm concerned about the independence of the data since its only 1 hour apart. As such I'm proposing we split each day up into 4 periods (dawn, dusk, night and day) and randomly sample 1 fix from each. I feel that this data will be independent. There is also evidence that dingos act differently in these 4 periods, which further increases the chance of independence. I was wondering what people thought? Is 1 hour far enough apart to assume independence? Is splitting the day into 4 periods and randomly sampling far enough apart to assume independence? Or is even that too close, and should it be further apart, like 1 day. Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training (mobile) 0410 689 945 (fax / office) (+618) 8952 7878 ch...@trickysolutions.com.au -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Kingsford Jones Sent: Monday, 19 July 2010 4:40 AM To: lgj200306 Cc: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] A question about PCNM analysis lgj200306, You didn't tell us, but since the problem was 'all the same' on both machines I'm guessing both instances used a 32bit build of R under Windows. If so, you'll be able to access, at most, about 3.5Gb of RAM (see RW-FAQ 2.9). The best solution is to upgrade to a 64bit build (IMO preferrably Linux, but a 64bit windows port is now on CRAN). You can also manage memory more carefully. E.g., the error indicates there's no contiguous block of memory to hold an object of size 190.7Mb at the time the error's thrown. That may be because all RAM is allocated, or because of fragmentation. R holds everything in memory so when working w/ large objects in a restricted setting you'll want to write unneeded objects to disc, clean up, and reload when needed (see ?save, ?load, ?rm, and ?gc). More info can be found at ?Memory and by Googling: R memory mangagement. Also, for some cases there are R packages that facilitate memory management: ff, bigmemory, biglars, bigtabulate, biganalytics, biglm,... Kingsford Jones On Sun, Jul 18, 2010 at 4:14 AM, lgj200306 wrote: Hi, all I want to do PCNM analysis using vegan and PCNM packages,my R code as follow: > bci10m=data.frame(x=rep(1:100,each=50),y=rep(1:50,times=100)) > bci10m.d=dist(bci10m) > library(PCNM) > pcnms10m.analysis1=pcnm(bci10m.d)#code 1##using function of pcnm contained in vegan package > pcnms10m.analysis2=PCNM(bci10m.d) #code 2##using function of PCNM contained in PCNM package > bci20m=data.frame(x=rep(1:50,each=25),y=rep(1:25,times=50)) > bci20m.d=dist(bci20m) > pcnms20m.analysis1=pcnm(bci20m.d)#code 3 > pcnms20m.analysis2=PCNM(bci20m.d)#code 4 The result shows that code 1,3,4 are all ok, I can get what I want using these three commands. However, code4 can't be carried out. Error message shows:"cannot allocate vector of size 190.7 Mb ". I have asked a professor about this question, he told me that maybe my computer's memory was not enough and suggested me closing the calculation of Moran_I. Then I recalculated these codes using another computer that had high capability. The problem was all the same. I don't know the reason. Another question, if I want to know how many pcnm eigenvectors' Moran_I are higher than expected Moran_I after using code1, how can I achive it in R? Thanks for your attention! 2010-07-18 lgj200306 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Limnology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...
Re: [R-sig-eco] change arrow colour when plotting rda
Don´t know how to solve the problem with the default plots... but you may want to consider plotting arrows manually using either arrows() in package graphics or Arrows() in package shape. The latter one has a couple of aestethically appealing options :-) cheers, gabriel On 07/05/2010 00:09, Devoto Mariano wrote: > Dear all, > can enyone please tell me how to change the colour of the arrows of the > environmental variables when plotting a contrained ordination done in rda? > They are in blue, but i want black. > I've tried any possible combination I can think of in plot(), arrows(), > ordiplot() and plot.cca() and none of them would work. > Thanks in advance for your reply, > M. > > > > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] NMS axis variance and legend
dear alida, legend() should help to get the legends, just ask for help(legend), it´s pretty easy. then for the variance explained: with an NMS the only measure of fit you get is the stress value, there isn´t anything like a percentage of explained variance. you may want to regard the stress value as the percentage of distances among points that is not reproduced by the ordination. Though that´s a bit of a sloppy way to see it. keep struggling, it´s worth it :-) cheers, gabriel On 15/04/2010 22:19, Alida Mercado wrote: > Hello, > > I'm doing an NMS, and have decided to try out vegan and do everything > in R. In this attempt, I haven't been able to figure out how to get > the variance explained by each axis, nor the total variation > explained by the ordination. the other issue I have is how to get a > legend, because I want to represent different sites in the > ordination, but the sites are grouped by season in which they were > sampled in order to look at differences in seasonality. Therefore I > would like to have a legend that represent each season by different > symbols. I've decided to start using R, so there are still some > issues I have to figure out and learn the commands and functions to > do what I need. > > If you have any suggestions, please let me know, > > Thanks in advance, > Alida > > > ~~~ > Alida Mercado Cárdenas > > Ph.D. Candidate, Entomology-Neotropical Environment Option > McGill University& Smithsonian Tropical Research Institute > > http://weevils-n-dreams.blogspot.com/ > ~~~ > > > [[alternative HTML version deleted]] > > > > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] adonis question
Hi Jaime, The interactions are just a matter of defining the formula as such, e.g. adonis(dist~factor1*factor2). I suppose, a multiple comparison (with the reasoning of a post-hoc test) can just be done using adonis() for pairwise comparisons and then use p.adjust(). Cheers, gabriel On 12/03/2010 19:12, Jaime Pinzon wrote: Hi After performing a permutational ANOVA with adonis in vegan, is there a way to do multiple comparisons for significant factors with more than 2 levels as well for significant interactions? Any help would be very much appreciated Thanks, Jaime [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] permutation for PCA
maybe PCAsignificance() in package {BiodiversityR} could be of help... cheers, gabriel Dragos Zaharescu wrote: Hi everyone, I was struggling for a while with performing a permutation/crossvalidation test of nonlinear PCA in order to assess the significance of the contribution of the separate variables to the nonlinearPCA solutions. Does anyone have an idea on how/package to perform this. For PCA maybe? Any hint would be much appreciated.  Dragos Dragos Zaharescu Animal Anatomy Laboratory Faculty of Biological Sciences Vigo University, apd. 137 36310, Vigo (Pontevedra), SPAIN zaha_dra...@yahoo.com zdra...@uvigo.es http://webs.uvigo.es/zdragos/ ~ You should be the change you want to see in the world ~ Ghandi ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Fwd: Fwd: how to calculate "axis variance" in metaMDS, pakage vegan?
A difference between two communities within a host could still exist and could make perfect sense, too, when you regard "community" as a random factor. Then "community" may introduce some extra variation (compared to the within-community variation), experimentally seen interesting and important, because the replication of communities makes sure you are not pseudoreplicating. I am not sure however, how to declare the correct df for the random factor in adonis in this case... anybody knows better than I? Gian Maria Niccolò Benucci wrote: Maria, *...Nevertheless you still do not know if your communities are significantly different between each other, within each host. Now it depends on the hypothesis you intend to test.*.. I think no sense for "Community" inside "Host"... Couse A and B are from the same host "Corylus", and C and D are from host "Ostrya". So the effect between two host tree species is real, but difference between two community inside the same host (A vs B i.g.) could not be. That is also confirmed by my diversity indices data I got (see my lastest post), they show that A and B are alwasy different from C and D, but between A and B (and for sure also between C and D ) there are no statistical differences (ANOVA). I think both "host" and "community" effect and if use them separately I got : adonis(sqrtABCD ~ Host, method="bray", data=env.table, permutations=99) Call: adonis(formula = sqrtABCD ~ Host, data = env.table, permutations = 99, method = "bray") Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) Host 1.0 1.64429 1.64429 5.38984 0.1242 0.01 ** Residuals 38.0 11.59276 0.30507 0.8758 Total 39.0 13.23705 1. --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 adonis(sqrtABCD ~ Community, method="bray", data=env.table, permutations=99) Call: adonis(formula = sqrtABCD ~ Community, data = env.table, permutations = 99, method = "bray") Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) Community 3.0 2.43264 0.81088 2.70182 0.1838 0.01 ** Residuals 36.0 10.80441 0.30012 0.8162 Total 39.0 13.23705 1. --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Thank you so much to all want to write any comments on that... Cheers, Gian ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Fwd: how to calculate "axis variance" in metaMDS, pakage vegan?
asy different from Ostrya one. ...So, I think that "Host" effect is clear while the effect of "Community" couldn't be the same in reason to that areas are similar 2 by 2, ...is it right? When I plot the MNS.2 and I watch to the Graph I clearly see that sample points of A,B areas or Corylus are positioned on the left side while areas C and D of Ostrya are more sparse and are positioned into the low right side... So, what else to say... I'll leave you space for any comments : Tank you all, Gian [[alternative HTML version deleted]] ___________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Fwd: how to calculate "axis variance" in metaMDS, pakage vegan?
Hi Gian and others, I think we better stop worrying about subjective interpretations of emotional backgrounds of what in other aspects are absolutely helpful discussion threads... I guess part of the challenge on this mailing list is to span the whole range of expertise with useful discussion/output/help for everyone, be it a student or an expert. I found this mailing list very helpful many times for my own questions, but also very informative when just following the threads on other questions... Gian, in my opinion, 2 dimensions are absolutely ok, especially if they do visualize an (obvious) effect in your study. In other words, if 2 dimensions show you an effect of "Host" but not of "Area", the effect is obviously strong enough. Then I would not worry about stress too much. However, there may still be an effect of "Area", maybe visible in more dimensions, but it´s obviously of minor importance. I personally like a combination of NMDS with the permutational MANOVA approach (by Marti Anderson) implemented in the function adonis() in vegan. You can use the same dissimilarity measure (Bray-Curtis) used for the NMDS and can test the "Area" vs. the "Host" effect on parasite (was it?) composition. I think that could be a very useful complement to an NMDS-derived ordination plot and then you may also regard high-stress "representations" (and that´s what all the low-dimensional ordination plots really ARE!) in a different light. Complementations like the permanova are in my opinion better than trying the full spectrum of ordination methods until finally some kind of pattern gets uncovered (comes quite close to the much too often encountered data-fishing expeditions). And though copying analysis strategies is probably not quite like throwing yourself in front of a bus, there is some benefit in using what people working in a specific field regard their "standard" methods (wait for the reviews to discover this). In any case, a responsible choice for a type of analysis is oriented along the study design and the data at hand. cheers, gabriel -- Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at Gian Maria Niccolò Benucci wrote: Hi Gavin and Hi all, I will not go in front of a bus for sure, I not mad, at least I am not still mad... :) I would like to tell you that I am a Ph.D. student, and for what I know, Ph.D. student still have to understand things studing those from whom wrote before them... Isac Newton became famous not only for his science but also for a famous phrase that, if I don't remember it bad, act like this :" If I have seen so much far away is because I stand on shoulders of Giants"... I think that it needs any comment, and express itself the concept... So, I am so sorry, I also don't like the "me to" attitude, but you don't know how is my reality here, and I can assure you that also If I am still a "student", I am alone in my research, and If have a tutor and boss for italian rules I don't have a boss for statistics, couse none could help me on that... So what could I do if I don't take models in already published literature? Anyway, I don't want to seem like the victim, I have a brain that works and I am doing my best to understand and improve my knowledge and at least lean and grow, for sure, step by step, and with a big humility, in science and in this case in statistics... Anyway... For continuing the brainstorm if I can...The Host effect is what I think is more interesting for the ecological point of view of my trials also becasue the 4 communities have two by two the same host, I mean A and B, Corylus, while B and C, Ostrya... If I plot the factors of the envifit into the graph and the evidence of separation seems clear... That's are my metaMDS with 2 and 3 dimensions: NMS.1 Call: metaMDS(comm = sqrtABCD, distance = "bray", k = 2, trymax = 100, autotransform = F) Nonmetric Multidimensional Scaling using isoMDS (MASS package) Data: sqrtABCD Distance: bray shortest Dimensions: 2 Stress: 24.54342 Two convergent solutions found after 18 tries Scaling: centring, PC rotation, halfchange scaling Species: expanded scores based on ‘sqrtABCD’ NMS.ABCD.2ef ***FACTORS: Centroids: NMDS1 NMDS2 CommunityA -0.3271 0.1984 CommunityB -0.1956 0.1768 CommunityC 0.2520 -0.2847 CommunityD 0.2706 -0.0905 HostCorylus -0.2613 0.1876 HostOstrya 0.2613 -0.1876 Goodness of fit: r2 Pr(>r) Community 0.1897 0.017982 * Host 0.1778 0.001998 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 P values based on 1000 permutations. NMS.1.3 Call: metaMDS(comm = sqrtABCD, distance = "bray", k =
Re: [R-sig-eco] capscale() for PCoA-CDA
Dear Jari and others, Hi everybody, Anybody has used capscale() in package vegan to compute a PCoA-CDA as suggested by Anderson and Willis 2003 (Ecology 84: 511 ff) using one or more factors as "predictors"? Then I wonder about: *) How to interpret interactions of factors? Why are interactions (specified as "~factor1*factor2" in the function call) shown as continuous predictors (using arrows) in the plot function? Wouldn´t centroids for all cells in the design be more appropriate? Aren´t factorial interactions in a CDA setting more or less meaningless? Internally capscale() uses constrasts of variables, and they are treated as continuous variables and shown as arrows in plots. However, if the constrasts correspond to simple factors, they are not drawn but their centroids are shown. For ordered factors you get both centroids and the arrows. The interactions of contrasts cannot be shown as simple class means and therefore they are drawn as arrows. The simple centroids are not appropriate, but you should have centroids of all combinations of class levels of interacting factors. If you think that factorial interactions in *** (what is CDA?) are meaningless, why do you want to use them? I wouldn't say they are meaningless, because that depends on your meaning. Often they are difficult to interpret, but that's another issue. I understand the arrows for interactions now, thanks. I used CDA in the sense of Anderson and Willis 2003 (and others) as Canonical Disicriminant Analysis, as such it is - at least to my understanding - equivalent to Discriminant Function Analyses. When CDA aka DFA is used with 2 interacting factors, it will try to best separate groups and that is *any groups*, and I can´t see why (and how) there should be preference given to any grouping criterion (factor 1, factor 2 or both)... In the end a 4-level factor should be as good as a 2*2 factorial combination. In this sense I used the word "meaningless". In fact, capscale() results for a 1*4 constraint (1 factor, 4 levels) are identical with a 2*2 constraint. However, centroids are at differnt positions (!), in fact centroids of all combinations of class levels are at weird (wrong as I think) positions in the 2*2 case!? Still, "interactions" finally make sense when interpreting the plot, that´s quite true. *) How to get classification statistics? And how to efficiently run a "leave 1 out" classification analysis? I thought of manually writing code that checks for the closest centroid. Would it be appropriate to use Euclidean distance as a criterion for this since it happens in PCo space? Probably there are more efficient functions which I do not know of, yet,... for example a function that allows extraction of distances of all objects to all centroids? There is no such thing. Contributed code will be reviewed for inclusion into vegan. *) Is the application of capscale on a Euclidean distance matrix equivalent to a classical DFA aka CDA on the original data - or am I completely wrong with this idea? No, it isn't equal to "DFA aka CDA". Perhaps... Depends on what are DFA and CDA. With Euclidean distances, capscale() is equivalent to redundancy analysis (RDA). Guessing that "DFA aka CDA" are discriminant analysis, RDA is not equal to them. The major difference is that RDA uses no information about scatter of points with respect to the class centroids, but it only uses class centroids. The RDA tries to maximize the distances among class centroids, but it doesn't try to maximize the separation of points of different classes. The methods are very different although the results may have some similarities. This is connected to the previous question: because RDA (that is in the heart of capscale()) does not try to optimize in classification, there is no classification statistic to be optimized. That should be estimated independently of the analysis and after the analysis, and there are no functions for the purpose in vegan. Slightly confused now... Anderson and Willis (2003) describe PCoA on a dissimilarity structure, followed by CDA or CCorA and call the procedure CAP (Canonical A of Principal Coordinates). I will call the latter two approaches PCoA-CDA and PCoA-CCorA. Now, I get that CCorA differs from RDA mainly conceptually, so there is not much (any?) difference between PCoA-CCorA and PCoA-RDA = capscale(). Now, is PCoA-CDA really equivalent to db-RDA (in the sense of Legendre and Anderson 1999)? I initially thought this would be the case. They both use a set of dummy variables to code for the factor and treat these as continous predictors. A second thought tells me they can´t be the same. Then maybe what´s left is only the term capscale() which is not the same as CAP in the case of PCoA-CDA... Seems I am getting lost in the panoply of acronyms, sorry... *) Given only one factor as a "predictor", I guess using permutest() or anova() on an object resulting from capscale is comple
[R-sig-eco] capscale() for PCoA-CDA
Hi everybody, Anybody has used capscale() in package vegan to compute a PCoA-CDA as suggested by Anderson and Willis 2003 (Ecology 84: 511 ff) using one or more factors as "predictors"? Then I wonder about: *) How to interpret interactions of factors? Why are interactions (specified as "~factor1*factor2" in the function call) shown as continuous predictors (using arrows) in the plot function? Wouldn´t centroids for all cells in the design be more appropriate? Aren´t factorial interactions in a CDA setting more or less meaningless? *) How to get classification statistics? And how to efficiently run a "leave 1 out" classification analysis? I thought of manually writing code that checks for the closest centroid. Would it be appropriate to use Euclidean distance as a criterion for this since it happens in PCo space? Probably there are more efficient functions which I do not know of, yet,... for example a function that allows extraction of distances of all objects to all centroids? *) Is the application of capscale on a Euclidean distance matrix equivalent to a classical DFA aka CDA on the original data - or am I completely wrong with this idea? *) Given only one factor as a "predictor", I guess using permutest() or anova() on an object resulting from capscale is completely equivalent to a direct application of adonis()? Correct? These are lots of questions at once and no code to play with, sorry... Thanks for any help! Gabriel ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] how to calculate "axis variance" in metaMDS, pakage vegan?
gian, you may try consecutive MDS-analyses with increasing number of dimensions (the parameter k in the isoMDS() or metaMDS() function). then plot stress against the number of dimensions and judge similar to a scree-plot in PCA. this should tell you how many dimensions to use for the MDS and as such also an appropriate associate stress-value. cheers, gabriel Gian Maria Niccolò Benucci wrote: Okey, really many thanks... So having low Stress value is foundamental, as it is as lower as higher the model fit the data, is that right? How can I know if my Stress is correct? I mean, if it is enough low to asses that the model fit good the samples data shifts into the graph... Is there a treshold or something? I would appreciate any pdf or kind of reviews on ordination models for community ecology data... :) Thank you really much! Cheers, Gian 2009/12/1 Gian Maria Niccolò Benucci Hi Hi there, I am trying to use funcion metaMDS (vegan pakage) for Community Ecology data, but I find no way to calculate the "expressed variance" of the first 2 axis? is there a way to do that? Thanks a lot in advance, Gian ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] how to calculate "axis variance" in metaMDS, pakage vegan?
hi gian, no, there is no such way. A MDS can´t express "explained variance". However, the stress value is the overall measure of quality of fit of your MDS to the data. There are various measures of stress, but loosely speaking you can regard the stress as a percentage of variation NOT explained by ALL dimensions in your MDS. cheers, g Gian Maria Niccolò Benucci wrote: Hi Hi there, I am trying to use funcion metaMDS (vegan pakage) for Community Ecology data, but I find no way to calculate the "expressed variance" of the first 2 axis? is there a way to do that? Thanks a lot in advance, Gian [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] vegan: envfit (vectorfit)
gavin and jari, thanks, all makes sense I have to state that remembering the discussion we had some weeks ago about fitting underlying (or environmental) variables to a MDS ordination, that using vectorfit for this purpose indeed would make sense for me, too. As long as before choosing the representation as a vector (which would indeed suggest linear behaviour over ordination space), a linear or at least monotonic behaviour of the metric variable over ordination space is checked (e.g. given using ordisurf) or different opinions? cheers, g Gavin Simpson wrote: On Tue, 2009-09-15 at 17:02 +0200, gabriel singer wrote: Hi vegan-users and programmers, Can anybody tell me how the function vectorfit (envfit) computes arrow lengths (as fits of a metric variable onto an ordination) exactly? I understand the scaling bit in the end, but have troubles to understand how actually the direction and strength of gradient of the environmental variable with the ordination is identified. Obviously it´s not a mere correlation between the environment variable and ordination scores, as is usually done for a PCA for example (the "loadings" as opposed to the eigenvectors). It is a least squares fit of the following form: Y ~ scores1 + scores2 where Y is the vector or matrix of numeric variables you wish to have vectors for, and scores1 and scores2 are the user-selected axes of the ordination configuration. If Y is a matrix then each variable (column) in that matrix enters as a separate regression. Effectively, it uses the locations of the points (sites) in the selected 2D ordination space to predict the observed values of the variables for which vectors are being fitted. The arrow heads are the normalised coefficients for scores1 and scores2, and hence represent the normalised change in response for a unit change in the scores1 and scores2 (the axis or site scores). As these are normalised, the large the coefficient (change in response for unit change in the site scores) the stringer the relationship between the sites scores and the vector. A key issue in the implementation is to consider the ordination space into which you project vectors as a 2D configuration of points and we want to relate these "locations" to the values of a secondary set of variable. HTH G thanks a lot for any good ideas.. gabriel -- Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] vegan: envfit (vectorfit)
Hi vegan-users and programmers, Can anybody tell me how the function vectorfit (envfit) computes arrow lengths (as fits of a metric variable onto an ordination) exactly? I understand the scaling bit in the end, but have troubles to understand how actually the direction and strength of gradient of the environmental variable with the ordination is identified. Obviously it´s not a mere correlation between the environment variable and ordination scores, as is usually done for a PCA for example (the "loadings" as opposed to the eigenvectors). thanks a lot for any good ideas.. gabriel -- Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] wascores() for metaMDS?
Dear Jari and Gavin, thanks a lot, everything clear... with the connection to CCA I now get the meaning of the species scores, almost trivial after all... gg Gavin Simpson wrote: On Wed, 2009-08-19 at 11:40 +0200, gabriel singer wrote: Hi sig-ecology! Here comes a probably stupid question... I am looking for smart ways to include information about underlying variables in MDS plots. In other words, after having computed an ordination with isoMDS or metaMDS from a community table, I would like to add something like species coefficients/loadings as vectors to the plot of sites. As no species coefficients exist in this case, the best I could come up with so far is simply vectors calculated from correlation coefficients of the individual species with the site scores (on two MDS axes). The function metaMDS allows to compute "species scores" using the function wascores() I have now pondered for 2 days how these scores are calculated and what their precise meaning would be. An individual taxon's "species score" is computed as the weighted average of the "site scores", weights being the abundance of that taxon in each site. It is the abundance weighted centroid of all the samples in which the species occurs. The motivation for this is that in CA, species scores are weighted averages of site scores that are themselves weighted averages of species scores and so on in the Two-way algorithm of Mark Hill - not that vegan computes the CA solution that way in cca() - so it is an analogous approach to computing species scores for nMDS. Would these species scores be appropriate to show as vectors in the MDS? Not as vectors, as that implies directionality or increasing abundance and there is no reason to assume that the abundance of a given taxon will increase linearly or even monotonically in a given direction across the nMDS plot. Although I hesitate to call it that, the species score computed as the weighted average of the site scores, is an optima (of nMDS site scores) and thus abundance declines as one moves away from the point. So in this sense, you display the species scores in the same manner as on a CA or CCA plot, as a point, instead of the vector in PCA/RDA. However, the decline in CA is uniform in any direction (fitted not actual abundance), i.e. in 2-D the species score is the point at the top of a 2-D bell-shaped surface as this is the implied response model in CA. With nMDS there is no reason to assume this is the case. For one or two taxa, you could just project a surface of actual abundances using ordisurf() or you could just use the points as you would in a CA diagram, more or less. The problem with the surface approach is that you can only show a couple of species at most on a single ordination plot. ordisurf would likely be the best option for most extra data you wish to impose on to the nMDS plot, again for the reason that the relationship between nMDS axes and the variable of interest need not be a simple linear or monotonic surface. HTH G Thanks for any answer... Gabriel Singer ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Dr. Gabriel Singer Department of Freshwater Ecology - University of Vienna and Wassercluster Lunz Biologische Station GmbH +43-(0)664-1266747 gabriel.sin...@univie.ac.at ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] wascores() for metaMDS?
Hi sig-ecology! Here comes a probably stupid question... I am looking for smart ways to include information about underlying variables in MDS plots. In other words, after having computed an ordination with isoMDS or metaMDS from a community table, I would like to add something like species coefficients/loadings as vectors to the plot of sites. As no species coefficients exist in this case, the best I could come up with so far is simply vectors calculated from correlation coefficients of the individual species with the site scores (on two MDS axes). The function metaMDS allows to compute "species scores" using the function wascores() I have now pondered for 2 days how these scores are calculated and what their precise meaning would be. Would these species scores be appropriate to show as vectors in the MDS? Thanks for any answer... Gabriel Singer ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology