Re: [R-sig-eco] predict NMS scores for new samples
Hi Jonathan, If you're using the metaMDS function in vegan with the monomds engine then it's possible. I have posted a function (monomds) at the bottom of http://ecology.msu.montana.edu/labdsv/R/labs/lab9/lab9.html that shows how to generate a labdsv:::nmds object from vegan's monomds function (courtesy of Peter Minchin). You will have to have package vegan loaded to get the monomds FORTRAN code loaded. Then you can use the function addpoints.nmds (also at the bottom of that page) to add points to an existing nmds. I'm not convinced that it's a good idea, but I worked with someone who needed it to fulfil contract obligations so I wrote it. Dave On 10/10/2014 11:07 AM, Jonathan Coop wrote: hi all, I'm wondering if anyone has developed code to predict NMS scores for new sample data, as the NMDS Scores Prediction algorithm functioned in PC-Ord. I am looking at the effects of multiple fires (reburns) on vegetation composition. I used an NMS ordination of pre-reburn vegetation samples to characterize vegetation, examine relationships between vegetation and subsequent fire severity, and stratify a re-sampling effort by key vegetation "types". I'd love to go back now with the post-reburn vegetation samples and see where they fall out in the original ordination space. Any and all thoughts welcome! Thanks, Jonathan Coop [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] a possible hack for adding species scores?
Hi Eliot, With respect to leafiness you're in a common problem area. Ordinal variables are hard to handle from any approach. One alternative (only undertaken with a well articulated rationale) is to promote them to interval variables, ordisurf(demo, as.numeric(to.ordinate$leafiness)) If you think the scale is wrong you can try other, e.g. 0,1,4,9,16,25 Otherwise you can do logistic regression on the classes ordisurf(demo, to.ordinate$leafiness==1, family=binomial) but you have one fewer degree of freedom than expected which can be a problem on a small data set. Alternatively, you can use envfit as you have been doing, but that also assumes they are categorical as opposed to ordered. Maybe someone else will weigh in with a better alternative for ordinal variables. Dave On 08/29/2014 11:05 AM, Eliot Miller wrote: Hi Dave, Thanks a million for the response! Your solution is good in some respects--when applied to my real data it shows me why I should be cautious of assuming any linearity in the response of my continuous variables. However, it doesn't work for the factors (which makes sense). You can run, ordisurf(demo, to.ordinate$leafiness) to see what I mean. Would it be reasonable to use some combination of ordisurf for the continuous variables, with the locations of the factors determined via envfit? Thanks again for the push in the right direction! Cheers, Eliot On Wed, Aug 27, 2014 at 10:24 AM, Eliot Miller wrote: Hi all, A few months ago I ran into a problem that others seem to have had. In short, if you use the FD package to calculate a Gower distance matrix which you then either feed back into vegan or use directly in FD, you "lose the species" scores associated with the points. You can't see what explains the distances among points. Examples of similar problems are here: http://r.789695.n4.nabble.com/Ordination-Plotting-Warning-Species-scores-not-available-td4664025.html https://www.mail-archive.com/r-help@r-project.org/msg153765.html I am less concerned with the plotting of the results than the previous posters have been. I am more concerned with simply seeing the table of "loadings" (don't think that term is correct for the PCoA). Perhaps this is an acceptable and widespread solution to the problem, and I simply haven't stumbled on it yet. It takes advantage of the vegan function envfit(). Or perhaps it is an incorrect solution. I'm wondering which of those two things it is, and hoping for a bit of input. In my case, I am making a distance matrix among foraging observations spread across a number of species. I want to know each species' niche size using the FDis metric. I made an example dataset and completely worked through example. The main point here is that I want to know whether the object "fitted" at the end is correctly explaining how things are "loaded". Any help is greatly appreciated. Thanks in advance! Eliot library(FD) #these first few lines are just building up an example dataset of foraging observations set.seed(0) maneuver<- sample(c(rep("sally",10),rep("probe",10),rep("glean",10),rep("hover",10))) set.seed(0) substrate<- sample(c(rep("leaf",10),rep("flower",10),rep("air",10),rep("branch",10))) set.seed(0) attack.height<- rnorm(40, mean=5) set.seed(0) relative.height<- rnorm(40, 0.5) relative.height[relative.height< 0]<- 0 relative.height[relative.height> 1.1]<- 1.1 set.seed(0) leafiness<- sample(c(0:5), size=40, replace=TRUE) #convert this to an ordinal variable leafiness<- ordered(leafiness) set.seed(0) distance<- sample(c(1:4), size=40, replace=TRUE) #convert to ordinal distance<- ordered(distance) #here is the complete dataset of example foraging observations to.ordinate<- data.frame(maneuver, substrate, attack.height, relative.height, leafiness, distance) #now make a second table that identifies to which species each observation belongs road.map<- matrix(ncol=40, nrow=3, 0) colnames(road.map)<- 1:40 row.names(road.map)<- c("species1","species2","species3") #this for loop assigns each foraging observations to one of the species set.seed(0) for(i in 1:dim(road.map)[2]) { species<- sample(row.names(road.map), 1) road.map[species,i]<- 1 } #run the FD calculations FDresults<- dbFD(x=to.ordinate, a=road.map, corr="lingoes", calc.FRic=FALSE, calc.FGR=FALSE, calc.CWM=FALSE, calc.FDiv=FALSE, print.pco=TRUE) #use the vegan function envfit() to see what is driving each axis from the PCoA fitted<- envfit(ord=FDresults$x.axes, env=to.ordinate, permutations=0, choices=1:10) [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology
Re: [R-sig-eco] a possible hack for adding species scores?
Hi Elliot, If you copy (or just rename) your FDresults$x.axes to FDresults$sites you can use all the vegan ordi__ functions on your results. E.g. FDresults$sites <- FDresults$x.axes demo <- ordiplot(FDresults) ordisurf(demo,to.ordinate$attack.height) which is a little more informative than envfit unless you're convinced the response should be linear. In addition, if you want, you could add the species in FDresults$species <- t(road.map) demo <- ordiplot(FDresults) to see species centroids. Dave On 08/27/2014 09:24 AM, Eliot Miller wrote: Hi all, A few months ago I ran into a problem that others seem to have had. In short, if you use the FD package to calculate a Gower distance matrix which you then either feed back into vegan or use directly in FD, you "lose the species" scores associated with the points. You can't see what explains the distances among points. Examples of similar problems are here: http://r.789695.n4.nabble.com/Ordination-Plotting-Warning-Species-scores-not-available-td4664025.html https://www.mail-archive.com/r-help@r-project.org/msg153765.html I am less concerned with the plotting of the results than the previous posters have been. I am more concerned with simply seeing the table of "loadings" (don't think that term is correct for the PCoA). Perhaps this is an acceptable and widespread solution to the problem, and I simply haven't stumbled on it yet. It takes advantage of the vegan function envfit(). Or perhaps it is an incorrect solution. I'm wondering which of those two things it is, and hoping for a bit of input. In my case, I am making a distance matrix among foraging observations spread across a number of species. I want to know each species' niche size using the FDis metric. I made an example dataset and completely worked through example. The main point here is that I want to know whether the object "fitted" at the end is correctly explaining how things are "loaded". Any help is greatly appreciated. Thanks in advance! Eliot library(FD) #these first few lines are just building up an example dataset of foraging observations set.seed(0) maneuver<- sample(c(rep("sally",10),rep("probe",10),rep("glean",10),rep("hover",10))) set.seed(0) substrate<- sample(c(rep("leaf",10),rep("flower",10),rep("air",10),rep("branch",10))) set.seed(0) attack.height<- rnorm(40, mean=5) set.seed(0) relative.height<- rnorm(40, 0.5) relative.height[relative.height< 0]<- 0 relative.height[relative.height> 1.1]<- 1.1 set.seed(0) leafiness<- sample(c(0:5), size=40, replace=TRUE) #convert this to an ordinal variable leafiness<- ordered(leafiness) set.seed(0) distance<- sample(c(1:4), size=40, replace=TRUE) #convert to ordinal distance<- ordered(distance) #here is the complete dataset of example foraging observations to.ordinate<- data.frame(maneuver, substrate, attack.height, relative.height, leafiness, distance) #now make a second table that identifies to which species each observation belongs road.map<- matrix(ncol=40, nrow=3, 0) colnames(road.map)<- 1:40 row.names(road.map)<- c("species1","species2","species3") #this for loop assigns each foraging observations to one of the species set.seed(0) for(i in 1:dim(road.map)[2]) { species<- sample(row.names(road.map), 1) road.map[species,i]<- 1 } #run the FD calculations FDresults<- dbFD(x=to.ordinate, a=road.map, corr="lingoes", calc.FRic=FALSE, calc.FGR=FALSE, calc.CWM=FALSE, calc.FDiv=FALSE, print.pco=TRUE) #use the vegan function envfit() to see what is driving each axis from the PCoA fitted<- envfit(ord=FDresults$x.axes, env=to.ordinate, permutations=0, choices=1:10) [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] fonction spc.pres in labdsv package
Hi Lisa, Sorry, I'm several days behind in monitoring the list. I'm not sure I can help without a copy of your data. Since your dataframe appears to have both strings ("i","p",etc) and integers I'm not sure what you're getting. table(unlist(veg)) ignores NAs, although I think vegtrans would have warned you about NAs. If the data file is too large to send as an attachment (to me, not the list), perhaps you could show snippets. E.g. > veg<-read.csv(file=file.choose(),dec=",",sep=";", header=TRUE) > any(is.na(veg)) > veg[1:10,1:10] and then > veg<-read.csv(file=file.choose(),dec=",",sep=";", header=TRUE) > any(is.na(newveg)) > newveg[1:10,1:10] and then > spc.pres<- apply(newveg>0,2,sum) > any(is.na(spc.pres)) Dave Roberts On 04/19/2013 06:49 AM, lisa couet wrote: Hi, I have an issue concerning the fonction spc.pres in package labdsv. the message is: Erreur dans plot.window(...) : 'xlim' nécessite des valeurs finies De plus : Messages d'avis : 1: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé 2: In max(x) : aucun argument pour max ; -Inf est renvoyé 3: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé 4: In max(x) : aucun argument pour max ; -Inf est renvoyé I code: veg<-read.csv(file=file.choose(),dec=",",sep=";", header=TRUE) attach(veg) ((to change my data into Braun Blanquet index )) library(labdsv) x<-c("i","p","p1","p2","p3",1,12,13,14,15,21,22,23,24,25,31,32,33,34,35,41,42,43,44,45,51,52,53,54,55) y<-c(0.5,3.0,3.0,3.0,3.0,15.0,15.0,15.0,15.0,15.0,37.5,37.5,37.5,37.5,37.5,62.5,62.5,62.5,62.5,62.5,85.0,85.0,85.0,85.0,85.0,97.5,97.5,97.5,97.5,97.5) newveg<-vegtrans(veg,x,y) (( abondance distribution)) ab<- table(unlist(newveg)) ((abondance s'affiche bien) ((Number of occurrences)) spc.pres<- apply(newveg>0,2,sum) plot(sort(spc.pres)) the error message is here. I have many NA because it is a file where columns are species and row elevation. So when specie is not present, nothing is in the cell. I hope one of you can help me, kind regards, lisa [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Zuur / Pinierho / Faraway
Philip, IS there an online errata,or do you just have to be smart and diligent? Thanks, Dave On 11/29/2012 07:30 AM, Dixon, Philip M [STAT] wrote: I agree with all the previous comments and second Tom's recommendations of Faraway as an 'in between' Zuur and Piniehro& Bates. One thing to be careful of: While the advice in Faraway is sound, there are more than a few mistakes in his equations. Philip Dixon ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] IndVal groups & Plotting an RDA with labels
Caitlin, Depending on the size of your data set one way to identify clusters is to label the dendrogram with cluster number. Lets say your Ward's result is called 'ward.hcl', and your cluster membership (perhaps returned from cutree) is in a vector called 'clustid' > plot(ward.hcl,labels=clustid) will produce the dendrogram with cluster number identified on the bottom. If the number of plots is too high they overwrite and it gets hard to read, but usually you can make it out if you stretch the plot. Alternatively, size is sometimes indicative, and you can simply > table(clustid) and see if cluster size is unique. Dave Roberts On 07/03/2012 07:12 AM, Caitlin Porter wrote: Dear all, I am working with a large data set to examine plant community structure on barrens habitats and have a couple of questions regarding Indval (labdsv package) on Wards Clusters (hclust function in stats package) and plotting axes of an RDA result with plot, plot.cca or esqplot functions (vegan, MASS, graphics packages) 1. I have run IndVal (package labdsv) on a Ward’s Cluster Analysis to determine indicator species for groups. I selected k=5 groups as my most meaningful and conservative (sample size, etc) classification. However, according to my average silhouette width, k=12 is the most optimal number of groups. I ran an IndVal on k=12 out of curiosity and notice some interesting patterns I would like to explore further but I am having difficulty understanding the output. The groups are labeled in the indicator species output as “group 1,2,3,4….12”. I do not understand how to determine which group is associated with which cluster (e.g.. in my Ward’s dendrogram – which is group 11?). It is somewhat obvious when k is relatively small because of the order the groups are clustered in, branch height in the dendrogram, and species frequency and abundances, however I’d like to know for larger groups. It would be ideal if I could even label the dendrogram with these groups. I’ve seen examples of these in a couple of papers with color coded boxes, but I can’t seem to figure out how to code it myself. 2. My second question relates to plotting an RDA. I have been able to run an RDA in vegan package successfully but unable to plot it in a way that I can interpret. I need to label sites, species (response matrix) and environmental variables/PCA axes (explanatory matrix). So far, I’ve only been able to label either the response matrix or the explanatory matrix in my graphs, but not all 3 sets of points. I’ve tried modifying plot function and code from Borcard et al 2011, (Numerical ecology with R), esqplot code for MASS package and plot.cca in Vegan package. I would prefer to use esqplot since I understand already how to better customize it, but I’m just looking to get any graph I can read at this point. When I use the plot function from Borcard et al. I see PC axes names only. When I use esqplot, I see species names only. I also tried plot.cca in vegan package but wasn’t able to call up a graph. This code looks like a great way to do it, but I’m not sure what I’m doing, *e.g*. what to put in for const. or what the ‘unexpected symbol’ error means. This old thread asks a similar question ( https://stat.ethz.ch/pipermail/r-help/2009-February/188282.html), but I’m not sure I understand its solution and have approximately 300 species so providing a separate name for each individually might not be feasible. This other thread asks another similar question ( http://r.789695.n4.nabble.com/RDA-Triplot-td3055474.html) but the author finds an error is generated specifying that biplot is not an appropriate method. I have included my code for question 2 below.Any help would be very much appreciated! Sincere thanks, Caitlin *#esqplot (MASS package) * library(MASS) #subset species and sites scores from the rda for first 10 RDA axes sr<- scores(c1.rda, display = c("sites", "species"), choices = c(1,2,3,4,5,6), scaling = 2) sites.only<- as.data.frame(sr$sites) srsp<- as.data.frame(sr$species) # data frame with just the species in it c1.site<- c1[,1] # object with just the site names from the original data set cp.m<- merge(c1.site, sites.only, by=0, sort=FALSE) # merged site object with site names eqscplot(cp.m$RDA1, cp.m$RDA2, xlim=c(-1, 1), ylim=c(-1, 1), col="blue", xlab="RDA Axis 1", ylab="RDA Axis 2", cex=0.3) # defining the variables& limits of plot and what the symbols look like text(cp.m$RDA1, cp.m$RDA2, labels=cp.m$x, col="black", cex=0.3) #adding names on plots text(srsp$RDA1, srsp$RDA2, labels=rownames(srsp),col= "red", cex=0.3) #adding names/species # Error in text.default(cp.m$RDA1, cp.m$RDA2, labels = cp.m$x, col = "black", : zero length 'labels' *#Borcard et al. 2011 plot function* plot(c1.rda, main= "Triplot RDA - scaling 2 -
Re: [R-sig-eco] Fwd: Landscape ecology in R
Manuel, Take a look at Jason Fridley's pages at http://plantecology.syr.edu/fridley/#gsmnp He's primarily using GRASS, but has integrated it with R in many cases. Dave Roberts On 03/11/2012 10:06 PM, Manuel Spínola wrote: -- Forwarded message -- From: Manuel Spínola Date: 2012/3/11 Subject: Re: [R-sig-eco] Landscape ecology in R To: Sarah Goslee Thank you Sarah and Stephen, I am in contact with Paul Galpern who developed some tutorials on landscape ecology in R (the ones that Stephen pointed out). He told me that he plan to release more in the following months. I offered him my contributions because I have some r code to develop spatially explicit occupancy models using landscape metrics, combining the raster, dismo and unmarked packages. I know that Kevin McGarigal at U Mass has some R material too. My idea with the message was to know if there is some tutorial on how to use r because many tools on landscape ecology analysis are produced as tools or extensions fro commercial software such as ArcView or ArcGis. Even open source QGIS do not have a well developed landscape analysis tools for landscape ecology, except the one developed for GRASS. Is not a bad idea to develop R material for landscape ecology. I know that there are a lot of packages or tools that can do the job but I am thinking on some good tutorials on how to combine those tools. I am an advocate of R and open source material and I am sure many in our list would like to use as teaching material. I am teaching a course on wildlife habitat evaluation (that include a lot of material on Landscape Ecology) and I am using R for the labs. The students really like to see all these analysis in R. Best, Manuel ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] [R] Component analysis / cluster analysis of multiple sites based on soil characteristics
Sacha, I do not fully understand your objectives, but there are several things to bear in mind in your approach below. You refer to your result object as water.pca, but it's simply a distance matrix, not a PCA. More problematic, perhaps, is that it's calculated on a matrix with very different values for the columns, e.g. temp is > 30 and no2 is < 0.01. In calculating Euclidean distance (the default for dist()) these scales matter a lot. If it's truly a clustering of sites based on these attributes you want you should standardize the columns before calculating dist(). Once you have a distance matrix from the standardized data you could use pam or agnes (as you have already done) but might also want to see an ordination. Given a Euclideandistance matrix I would recommend Principal Coordinates Analysis (PCO or PCoA depending on source) which I believe is available in the ecodist package you already have loaded. Dave Roberts On 01/23/2012 05:51 AM, Sacha Viquerat wrote: Hello dear list! Maybe I am demanding too much, but I am having problems finding the right way to tackle a seemingly trivial problem: We counted fish at different sites. In order to assess habitat quality at each site, we sampled temperature, pH etc. at each site, resulting in 243 observations of 8 independent variables. As we would like to identify clusters within this data set, we stumbled upon three approaches: two as realized in package cluster, using dist to create a distance matrix from our numeric variables and then pam to produce a model or agnes and then various tree methods to simplify the tree, as well as an approach via the ecodist package (using distance and pco). while results obtained through the cluster package were the same (phew!), the result from the ecodist approach did not identify clusters at all. As we are all confused and I am the one in charge of deciding which way to go, And as I am the one most confused after all, I am completely lost. Doing such an anlysis for the first time, I would be satisfied wit the pam approach identifying 2 clusters (via iterating over each k in 2:10 and picking the max average silhouette of each model). However, as there are so many different approaches out there, I am not sure if all the assumptions are met! It seems for example that pH is more or less randomly distributed. Should we keep such variables? How can I access the actual loadings of the principal axis of the pam model? Couldn't find that anywhere! In the end, there are only 33 observations in the 2nd group, which will be making the further analysis of fish counts heavily unbalanced. Any suggestions? Code snippet: water.par temp pH DO BOD COD no3 no2 po4 1 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20 2 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20 . . . 243 29.1 7.4 6.80 12.56 45.0 0.740 0.002 0.32 water.pca<-dist(as.matrix(water.par)) k=best.k(water.pca,c(2,10),stand=T,trace=1) #finding the k with highest average silhouette dist clus.model<-pam(water.pca,k,stand=T) clus.model$clusinfo size max_diss av_diss diameter separation 1 210 30.12712 8.445552 42.29439 27.88689 2 33 12.74630 7.725452 21.91972 27.88689 water.md <- distance(water.par, "euclidean") water.pco<-pco(water.md) plot(water.pco$vectors[,1], water.pco$vectors[,2]) Thanks in advance and sorry for the verbosity level at max!!! ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Recommended R package for analyzing community data set? - nested design, stratified random sampling, covariates
Dear Laura, David is certainly correct that vegan would provide a wealth of tools for the analysis of your data. More generally, if you go to CRAN and browse the "environmetrics" Task View you will see an review of many packages suitable for specific analyses you may be interested in. Dave Roberts On 01/15/2012 12:10 PM, Laura S wrote: Dear all: May you recommend an R package for analyzing this data set? I would greatly appreciate any thoughts you can provide. I. Study goals This study examines soil crust (lichens and bryophytes) recovery and succession in fields that underwent different levels of disturbance. II. Variables Response variables of interest: soil crust cover (categorical scale - described below), species richness, species composition Explanatory variable of interest: disturbance regime (categorical variable) Environmental variables measured (covariates - mix of categorical and numerical variables): cover of mineral soil, litter, vascular plant bases, stones, or rocks, slope, aspect III. Study sampling and design Eight research areas (BR, CB, CC, JL, PC, PL, SL, TR) Within each research area subplots were assigned six disturbance treatments (NC, NS, OC, OS, SC, SS) based on disturbance history A single transect was placed randomly in the center of each subplot and sampled in twenty 20 x 20 cm plots at 1 m intervals along the transect 47 of 48 possible treatment subplots were sampled (n=6 for 7 sites, n=5 for 1 site) Sampling cover scale: Scale valueRepresentative % cover 1<= 1 2>1-4 3>4-10 4>10-25 5>25-50 6>50-75 7>75-95 8>95-100 There were three different sampling times (spread over two years), but time of sampling was not considered as a confounding factor given the way sampling was conducted with the particular communities studied (soil crust communities). Total species positively identified: 33 taxa (species and species groups), 15 of these were in four or more of 47 subplots (n=6 x 7, n=5 x 1) Unidentified collected taxa: less than 0.5% Approximate taxa pool (species observed in entire areas, but not necessarily in sample plots): 26 lichen + 21 bryophytes = 47 taxa Thank you for your time and consideration, Laura [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The problems on BIO-ENV procedure by [R]
Shun Tsuobi, isoMDS (and routines based on it) will not allow zero dissimilarity, which implies perfect replicates. One alternative is to remove one of the replicates, but that may have unsatisfactory effect on further analyses. Alternatively, if you are sure you want to keep the duplicate plots you can change the zero to a small value d[d==0] <- 0.0001 and run the isoMDS on the resulting revised dissimilarity matrix. Undoubtedly someone from the vegan group will respond to your second question. Dave Roberts On 12/05/2011 01:28 AM, 坪井 隼 wrote: Dear Madam / Sir, I have two questions for the use of the “R” program for the ecological research. I am studying the relationship between the community structures of environmental microbes and some environmental conditions. For this objective, I have known that the BIOENV procedure, which was developed by Clarke& Ainsworth (1993), is available on the “R” software. Fist question; I attempted the use of the procedure to analyze the relationship between the variation of the microbial community structures and the environmental factors. However, I can not analyze the relationship based on isoMDS function. The isoMDS was inacceptable for my dataset. The command for the BIOENV procedure, which I programmed, and the error massage I gained was as follow; library(MASS) library(vegan) communitydat<-read.table("C:/Documents and Settings/shuntsuboi/desktop /bray.txt", head- er=T) environdat<-read.csv("C:/Documents and Settings shuntsuboi/desktop/ev. csv",header=T) env<-environdat[,c("variablesA","variablesB.","variablesC","variablesD ","variablesE")] d<- vegdist(communitydat, "bray") isoMDS(d) error isoMDS(d) : zero or negative distance between objects 1 and 2 As mentioned above, I can not run the program because the error, which is “isoMDS(d) : zero or negative distance between objects 1 and 2”, occurred. What are the ways to solve this problem ? On isoMDS function, what are the ways that the zero distance of “Bray-Curtis distance” is acceptable in the function ? Second question; Based on the command as above, I ran the metaMDS function. However, although I could automatically describe the two dimensional ordination plot figure, I could not gain the X and Y value of the respective plots. Then, the error massage was shown as follow; “In ordiplot(x, choices = choices, type = type, display = display, : Species scores not available” What are the ways to solve this problem ? Sincerely yours, Shun Tsuboi ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] post hoc in Kruskal Wallis
If the design is balanced this function will work npcompare <- function (var,class) { k <- length(table(class)) n <- length(var)/k se <- sqrt((n*(n*k)*(n*k+1))/12) rvar <- rank(var) for (i in 1:(k-1)) { left <- sum(rvar[as.numeric(class)==i]) for (j in (i+1):k) { right <- sum(rvar[as.numeric(class)==j]) diff = abs(left-right) cat(paste(i,j,left,right,diff,round(1-pt(diff/se,k),4),'\n')) } } cat('\n') } It's not formatted very elegantly, but it works. Dave R. On 11/24/2011 08:01 AM, Richard L. Boyce wrote: Zar, in Biostatistical Analysis, has a Tukey-like test that is highly recommended. You may need to do it by hand, but it is quite easy. Message: 5 Date: Wed, 23 Nov 2011 18:21:54 +0100 From: "Jakub Szymkowiak" To: Subject: [R-sig-eco] post hoc in Kruskal Wallis Message-ID: Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Hi, does anyone know, how can I perform post-hoc tests (especially Least Significant Difference and Sheffe Test) for results from Kruskal-Wallis test? In KruskaI-Wallis test I found some significant differences between tested groups, but I want to know between which groups this difference is really signifficant. Cheers, Jakub -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Sources for learning mapping/GIS with R?
R and GRASS can both be made to talk to the same database (I use PostgreSQL) at the same time, allowing you you to use the best tool for each job and keep everything in sync. You might also want to look at http://pvanb.wordpress.com/ and Dylan Beaudette at http://casoilresource.lawr.ucdavis.edu/drupal/node/905 Dave Roberts On 10/25/2011 11:58 AM, Marcelino de la Cruz wrote: Also, for open source GIS ( GRASS and R) Neteler, M., Mitasova, H., 2008. Open Source GIS: A GRASS GIS Approach, 3rd Edt. Springer, The International Series in Engineering and Computer Science: Volume 773. 406 p. By the way, you can find a lot of useful resources in the home page of Tomislav Hengl: http://spatial-analyst.net/wiki/index.php?title=Main_Page Cheers, Marcelino At 19:47 25/10/2011, Sarah Goslee wrote: There's a spatial taskview that's the perfect place to start: installing the taskview will give you very nearly all the related packages. http://cran.r-project.org/web/views/Spatial.html For learning how to use them, Bivand et al's Spatial Analysis book is good: http://books.google.com/books/about/Applied_spatial_data_analysis_with_R.html?id=4gg8yx_lcj0C Sarah On Tue, Oct 25, 2011 at 1:11 PM, Raphael Mazor wrote: Can anyone recommend some basic resources for learning how to create maps with R? Or recommend packages that are particularly helpful? Thanks!-- -- Sarah Goslee http://www.functionaldiversity.org ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology Marcelino de la Cruz Rot Depto. Biologia Vegetal EUIT Agricola Universidad Politecnica de Madrid tel: 34 + 913365435 [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On 10/10/2011 02:15 PM, Gavin Simpson wrote: On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote: On Mon, 10 Oct 2011, Dave Roberts wrote: I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try I did not read the original message, so I hope you'll allow me to join the thread. My recommendation is to use univariate tree models, particularly a classification tree (for ordinal explanatory variables; i.e., ST1 and ST2). But the response here is *multivariate* - of course, one could use Glen De'Ath's multivariate regression trees (despite the name it is really a constrained clustering/classification) - but I think there are better ways of solving this particular problem. And unless one has many 100s of observations, the model will need some sort of variance reduction applied (via bagging, or some such) as the one fitted model is potentially highly unstable. G This is fully, carefully, and non-technically explained in Chapter 9 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith "Analysing Ecological Data." For that matter, I highly recommend reading the whole book. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology It would be fairly simple to boil down to a univariate question. You could do something as simple as a paired t-test of plot-level species richness or the number of individuals sampled (to compare sampling efficiency), but I still don't see an independent and a dependent variable. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On 10/07/2011 08:51 AM, Dr N.A. Cutler wrote: Dear All, I have a query about multivariate analysis of community data. In my experiment, 24 microbial communities in different locations were sampled using Sampling Technique 1 (ST1). A site X species matrix was then derived by molecular analysis. The same 24 locations were then sampled again using a different sampling technique (ST2) and a second site X species matrix was derived. It is assumed that community structure remains intact after sampling by Technique 1 i.e. the two techniques can sample from the same pool of organisms. I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. My research question is: does Technique 1 produce a similar signal to Technique 2? Or do the different techniques give significantly different pictures of community structure? The null hypotheses is that there is no significant difference between the two sampling techniques i.e. they both capture community structure with the same degree of accuracy. It occurred to be that I could use a multivariate ANOVA technique (e.g. Adonis) to distinguish between the results of the two sampling exercises, using sampling technique as a factor. But I am not sure how to deal with the obvious correlation between sample pairs. Should this situation be addressed as a repeated measures experiment with two time steps? If so, what is the best technique to use (a mixed model, perhaps?) Any advice would be gratefully received. Best wishes, Nick Cutler ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology Nick, I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try > actual <- sum((ST1-ST2)^2) and then permute one of the two matrices numerous times res <- rep(NA,999) for (i in 1:999) { res[i] <- sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2) } final <- (sum(res <= actual) + 1)/1000 and see what fraction of the permuted matrices are as similar. Hopefully Gavin will weigh in with a better randomization. If you do go with a multivariate approach I might try a procrustes analysis of PCO ordinations. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] help with analysis
Hi Kátia, Sorry to be so late; just back in the office. Pedro is correct that adonis will help establish statistical significance to potential differences, but I think NMDS could still be very helpful. One approach would be to code the sites by glyph (e.g. site 1 = circle, site 2 = triangle, etc.) and then to draw arrows from the first date to the second, second to third, etc, for each site. In labdsv you could do this if your nmds is called nmds.object > plot(nmds.object,type=n') to draw the axes but not the points, and then use > points(nmds.object,site==1,pch=1) > points(nmds.object,site==2,pch=2) etc. Then, depending on how the sites are sorted, the arrows could be drawn > arrows(nmds.object$points[1,1],nmds.object$points[1,2],nmds.object$points[2,1], nmds.object$points[2,2]) to draw an arrow form the first point to the second. YOu might need to mess with the parameters of arrows() to get the arrow sizes you want. In vegan you could do > nmds.plot <- plot(metaMDS(dissimilarity or taxon matrix),type='n') > points(nmds.plot$sites[site==1]) > points(nmds.plot$sites[site==2],pch=2) etc > arrows(nmds.plot$sites[1,1],nmds.plot$sites[1,2], nmds.plot$sites[2,1],nmds.plot$sites[2,2]) etc. Hope that helps, Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On 08/08/2011 08:20 AM, Kátia Emidio wrote: Dear all, I have a data from inventory of plants in 6 fragmented forests of the same size,randomly selected, measured during 3 periods of time, and I'd like to know about changes in species composition over time. It is a good idea to use NMDS for each period of time and then to compare the changes in the ordination diagram?? What kind of analysis could be better?? Cheers, Katia [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] controlling the species displayed on a biplot
Thanks Gavin. That's actually very nice and I suspect just what Andy was looking for. Dave On 05/23/2011 04:33 AM, Gavin Simpson wrote: require(vegan) require(labdsv) data(dune) data(dune.env) ## Indicators of Management type set.seed(38) inds<- with(dune.env, indval(dune, Management)) ## ordinate dune.pca<- rda(dune, scale = TRUE) ## exract scores you want to plot presume species and sites dune.site<- scores(dune.pca, display = "sites", scaling = 3) dune.spp<- scores(dune.pca, display = "species", scaling = 3)[inds$pval<= 0.05, ] ## plot plot(dune.pca, display = c("sites","species"), type = "n", scaling = 3) points(dune.site) arrows(0, 0, dune.spp[,1], dune.spp[,2], col = "red", length = 0.05) lab.xloc<- dune.spp[,1] * 1.2 lab.yloc<- dune.spp[,2] * 1.2 text(lab.xloc, lab.yloc, rownames(dune.spp), col = "red", cex = 0.8) -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] controlling the species displayed on a biplot
On 05/20/2011 11:54 PM, Andrew Halford wrote: Hi All, I have done a IndVal analysis on a dataset which has identified 27 species out of 57 as being significant arbiters of site differences. What I want to do is a PCA biplot (data is hellinger transformed prior to this) with only these 27 species on the ordination, not all 57, How can I control the species I want displayed? cheers Andy -- Andrew, We need a little more information. Which IndVal function did you use? It should return a vector of p-values you can use as a mask [p-val <= 0.05,] on your taxon matrix. But are you saying you want to use all 57 in constructing the PCA and only identify 27, or that you only want to use those 27 in the construction of the PCA? Dave Roberts ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Thanks Jari, My original thought was to write a wrapper for the original FORTRAN code by replacing the file read of data with data passed from R, and then bringing in the results in a list. That would allow sizing the arrays at run-time and eliminating fixed array sizes. I have a copy of the FORTRAN code that Petr Smilauer modified for simplified input/outputand that helped. Still, it ultimately appeared pretty messy (but still might be the best route), so I tried separating out the subroutines and calling them individually from R. From there I tried replacing some of the subroutines with native R to lower overhead. But in the end I just couldn't understand the code well enough to make it work. So then I thought I should write write a totally transparent version in native R, even if it doesn't replicate the original. On the down side people can say it's not correct; on the upside it's open source and people can evaluate it and modify it as they see fit. So, if there is interest I might post the code and examples on my web page and let somebody else have it to run with. Dave Jari Oksanen wrote: On 27/04/11 00:40 AM, "Dave Roberts" wrote: Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. Dave, Hill, Bunch & Shaw describe the general idea of TWINSPAN, but the implementation is more complicated. Martin Kent and Paddy Coker do a great job of explaining the twists in their book ("vegetation description and analysis: a practical approach"). If I remember correctly, the TWINSPAN manual also was more detailed, but I lost it somewhere when I moved around (for the kids: it was a bunch of paper: pdf was not yet invented when TWINSPAN was published). I don't think that the actual TWINSPAN is easily extended beyond CA. Each step is a two-stage one-dimensional ordination on a current subset, where the first stage selects indicators and the second stage is polarized for the indicator species. The final split is based on site ordination and indicators are secondary (which we see in misclassifications if you try to use the provided key for the data that was classified in TWINSPAN). The polarization stage is particularly challenging when working with dissimilarities (PCO, NMDS). I don't think that the FORTRAN I have is completely impenetrable. I think the largest problem is the design principle: R code should run silently and return a result, but TWINSPAN prints when it goes on and returns only a part of the result. Incorporating that in R would need stripping most PRINT and WRITE and have subroutines to return useful data directly. I also wrote a small funny test on TWINSPAN principle, where the splitting and pre-defined pseudospecies where replaced with regression tree split. I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a separate message. Cheers, Jari Oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Dear Zoltan, Thanks for the note. The R function I wrote does in fact follow the Roleček et al protocol, and that's partly what motivated the idea to write it up. Lubomír Tichý, Petr Smilauer, and Laco Mucina have all contributed information in the development, but I've still been stymied by the lack of solid information on the actual algorithm. I think it is quite possible to write a function that operates on the principle of TWINSPAN, following Roleček et al, but writing a function that exactly matches the output from the commercial package may prove to be too much trouble. Thanks, Dave Zoltan Botta-Dukat wrote: Dear Dave, This modified version of TWINSPAN may be interesting for you when you compare methods: Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Journal of Vegetation Science 20(4): 596–602 http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract Zoltan 2011.04.26. 23:40 keltezéssel, Dave Roberts írta: Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] The final result of TWINSPAN
Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch & Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On 04/14/2011 01:57 AM, Jari Oksanen wrote: On 14/04/11 10:37 AM, "Yong Zhang"<2010202...@njau.edu.cn> wrote: Dear all, I conducted the two-way indicator species analysis using TWINSPAN program, and following is the final result: 0111 00011011 011000111 01001001 I have to certify my analysis, I want to classify the above 24 sampling sites into 3 major groups based on 7 biotic metrics. The name of my 24 samples could be site1 to site24, from the left to the right, and I set the cut levels 0, 2, 5, 10, 20, the maximum level of divisions: 6, and maximum group size for division:3 . Now, my question is whether my setting is correct? And how should I classify these sites into 3 groups accoding to this final result? Dear Yong Zhang, This is not an R issue, because there is no TWINSPAN in R. However, the answer to your question is that strictly speaking you cannot group your data into three major groups with TWINSPAN. TWINSPAN is a bisection method so that first division gives you two groups, and second splits each of these into two groups so that the next choice is to have four groups. However, in this case one of the groups was so small (3 plots were split off from other in the first division, and then these were split into groups of 2 plots and 1 plot) that you probably can ignore the second division of the small group. If your goal was as vague as wanting to classify 24 sites into 3 major groups you could do better than use TWINSPAN: what's the problem with proper classification methods in R? Moreover, have you checked that your "biotic metrics" suit to the pseudospecies cut level concept of TWINSPAN? Cheers, jari oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Dissimilarity ranking
Well, it was not only a little sloppy, but a little late. Reading further I saw others had made the same suggestion a while ago. Nonetheless, that will remind me to paste my emails into R and check. Thanks François On 12/09/2010 03:34 PM, francois Guilhaumon wrote: Dear all, I would like to point out three small mistakes in Dave's code, which of course are certainly typos. If one wants to not get only NAs in the result vector, one should specify na.rm=TRUE in the apply statement, furthermore I guess that the mean function has to be applied to 'demodis2' and not 'dimodis' which is an object of class 'dist', finally there is no need to transform 'demodis' again using the as.matrix function in the apply statement: demodis2<- as.matrix(demodis) # make a full matrix copy is.na(diag(demodis2))<- TRUE# ignore the dissimilarity of a # plot to itself apply(demodis2,1,mean,na.rm=TRUE) # calculate the mean dissimilarity # of every plot to the others I checked the code, it's now working. Cheers, François. 2010/12/9 Dave Roberts: Burak, I think your question is simpler than the suggestions of NMDS. One approach, let's say you dissimilarity matrix is called demodis demodis2<- as.matrix(demodis) # make a full matrix copy is.na(diag(demodis2))<- TRUE# ignore the dissimilarity of a # plot to itself apply(as.matrix(demodis),1,mean) # calcualte the mean dissimilarity # of every plot to the others This is all done as part of function disana() in package labdsv along with some other simple dissimilarity analyses. Dave Roberts On 11/23/2010 01:09 PM, Pekin, Burak K wrote: Hello, I want to rank the dissimilarity of sites based on their species composition. For example, I would like to be able to say that site A is less similar in composition to the other sites than site B is similar to the other sites. I could do a cluster analysis and look at which sites are less closely clustered. It would be even better if I could come up with a quantitative scale rather than a relative ranking that would give a value for each site based on its relative dissimilarity to the rest of the sites. So site A might receive a 90 out of 100, whereas site B and C might receive a 60 and a 50 indicating the rank as well as 'relative quantity' of dissimilarity for each site. Thanks, Burak -- Burak K. Pekin, PhD Postdoctoral Research Associate Purdue University [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Dissimilarity ranking
Burak, I think your question is simpler than the suggestions of NMDS. One approach, let's say you dissimilarity matrix is called demodis > demodis2 <- as.matrix(demodis) # make a full matrix copy > is.na(diag(demodis2)) <- TRUE# ignore the dissimilarity of a # plot to itself > apply(as.matrix(demodis),1,mean) # calcualte the mean dissimilarity # of every plot to the others This is all done as part of function disana() in package labdsv along with some other simple dissimilarity analyses. Dave Roberts On 11/23/2010 01:09 PM, Pekin, Burak K wrote: Hello, I want to rank the dissimilarity of sites based on their species composition. For example, I would like to be able to say that site A is less similar in composition to the other sites than site B is similar to the other sites. I could do a cluster analysis and look at which sites are less closely clustered. It would be even better if I could come up with a quantitative scale rather than a relative ranking that would give a value for each site based on its relative dissimilarity to the rest of the sites. So site A might receive a 90 out of 100, whereas site B and C might receive a 60 and a 50 indicating the rank as well as 'relative quantity' of dissimilarity for each site. Thanks, Burak -- Burak K. Pekin, PhD Postdoctoral Research Associate Purdue University [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] some MDS and R problems
Arnaldo, Given that NMDS operates essentially on rank distances, small variations in dissimilarity generally have low impact. I rarely have points with zero dissimilarity on quantitative indices, but it sometimes happens on presence/absence based indices. If you have a dissimilarity object named dis, I just do > dis[dis==0] <- 0.0001 > res <- nmds(dis) Following Jari's argument about the other point's contribution to stress, I think fudging one dissimilarity is a better solution than dropping out a point Dave Roberts Arnaldo Russo wrote: Hi all, I'm executing some multivariate exploratory annalysis, with vegan and MASS packages. Some problems are not solved for me at this moment. I have read all past discussions over MDS problems. In my studies occurs a situation that some samples are equal each other, and this results in zero dissimilarities. I tryed with no success, use the stepacross function. So, in this case I changed one of these zero dissimilarities values for a minimum (1e-5). Some of these modifications are different than "stepacross" or same isoMDS(x.dist + 1e-5) (proposed by Jari oksanen to 'lie' to isoMDS? When I delete a sample I do not loose some weight? Changing one of those zero dissimilarities and using zerodist = "add" option of metaMDS, it passed the current error (zero or negative distance between objects). Someone could explain the function posted by Gavin Simpson (Jan 12, 2010; 05:45pm Re: Non-metric multidimensional scaling (NMDS) help). I didn't get some result. If there is a good reason, and you want to include all samples, then you'll need to come up with a means for handling them. metaMDSdist allow you to add a small value to the zero dissimilarities. The details are in the code, but effectively all zero distances are replaced by half the smallest non zero distance. You could do a similar replacement yourself if you feel this is warranted and/or justified. minDij <- min(Dij[Dij > 0) / 2 Dij[Dij <= 0] <- minDij Will do this replacement if Dij is your matrix (replace Dij with whatever the name of your matrix is). Then supply the new matrix to metaMDS. " When I use unique function as #x.dist <- dist(unique(X)) this cutted some of my samples that I do not know. I can't see my variables in the plot. Some help? Cheers, Arnaldo. epi<- read.table("epi.txt", h=TRUE,row.names=1) library('vegan') library('MASS') epi.<- as.matrix(epi) dissim1<- vegdist(epi., method="jaccard", binary=TRUE) dissim1.mds<-metaMDS(dissim1,k=2) plot(dissim1.mds,type="n") text(dissim1.mds, labels=as.character(row.names(epi))) #epi samplesp1sp2sp3sp4sp5sp6sp7sp8sp9 sp10sp11sp12sp13sp14sp15sp16sp17sp18sp19 1E10100000000000 00000 2E00101000000000 00000 3E10100000000000 00000 4E10101000000000 00000 5E10000000000000 00000 6E10000000000000 00000 7M10100010000000 00000 8E10000010000000 00000 11E10001010000000 00000 12E00000100000000 00000 13E00000010001100 00000 14E00100000000100 00000 15E10000010000010 00000 17E00001010100000 00000 18E00000010000000 00000 20E10001000000000 00000 21E10000000000001 00000 22E10001000100000 00000 23E10111000000000 00000 27E10011000000000 00000 28E10001011001000 00000 30E11001000000000 0
Re: [R-sig-eco] multivariate smoothing and gradient estimation
Chris, One (sub-optimal) solution would be to fit GAMS and then have the gam.predict estimate values immediately near your data points which you could use to calculate a local gradient. If the GAM is reasonably smooth, I would think you could get estimates that were reasonable. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 Chris Martin wrote: Dear list members, I am a looking for a function that can calculate a "surface" from at least four predictor variables and one response variable. I would then like to calculate the first derivative of a specific point on this surface. I have looked at many packages for nonparametric smoothing and kernel density estimation but have been unable to find any that fulfill both these criteria. For example, loess can handle multivariate data, but I do not how to extract the derivative from the resulting fit? Many smoothing splines offer predict functions to extract the derivative, but these functions can only handle univariate data (e.g. smooth.spline). Ideally, I would like to use local estimates of the surface (i.e. loess). I would appreciate any suitable functions or advice on where to look for functions that fulfill both these criteria. Thank you very much for your time. best wishes, Chris Martin Population Biology Graduate Group '12 University of California, Davis [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology - ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Teaching materials
Volker, You're welcome to take look at my site: http://ecology.msu.montana.edu/labdsv/R It's primarily for multivariate analysis in community ecology, bit does also have some stuff on GLMs and GAMS in an ecological context. You might also visit with Hank Stevens at Miami as he's just next door to you. Dave - David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On Wed, 2009-08-26 at 16:39 -0400, Volker Bahn wrote: > Hi all, > > I'll be teaching a 400/600 level bio/ecostatistics class using R this > fall (using "Introductory Statistics with R" as text book). I was > wondering if anyone here had any teaching material (lecture slides, > exercises, homework, projects, code for teaching etc) that they would > like to share? Pointers to material on the web would also be welcome. I > found a few courses with course material on the web but they were > typically programming oriented and in health sciences rather than > ecology. If you'd rather reply off list, I'll summarize the responses > for the list. > > Thanks, > > Volker > > Biological Sciences > Wright State University > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Fuzzy-Set-Ordination for classifying plant species???
Dragos, I'm sorry but I do not understand your question. Are you creating a dissimilarity matrix of lakes or species? I suspect that a transposing of your matrix might work, but I need to understand your problem better. Please send more detail Thanks, Dave On Tue, 2009-08-18 at 02:55 -0700, Dragos Zaharescu wrote: > > Hello, > > I have a matrix with 180 plant species (variables, binary) and 270 rows > (altitude waterbodies). There is also one categorical variable (4 categories > representing similar lakes groups, which resulted from a prior analysis of > the transposed matrix). > Is there any way to load plant species on those categories in > fuzzy-set-ordination analysis? Is there anyone here that has faced this > question before or could provide a hint? All my efforts so farhave lead to > loading the lakes on the categories (fso in R); but I want to load the plant > species. > > I would greatly appreciate your help. > > Dragos Zaharescu > Vigo University > > ~ You should be the change you want to see in the world~ Ghandi > > > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Clustering large data
Thierry and Hadley, Sorry to be late coming into this (I forgot I subscribed to sig-eco). package labdsv has a function called matrify() which takes a three column data.frame (sample,taxa,abundance) and creates a full (sparse) matrix representation. I've never tried it on a data set as large as yours, and I'm curious if it would work. It's pure R, but if worst comes to worst I used to have a FORTRAN version that would probably work. Please give matrify a try and let me know. Dave R. matrify <- function (data) { if (ncol(data) != 3) stop("data frame must have three column format") plt <- data[, 1] spc <- data[, 2] abu <- data[, 3] plt.codes <- levels(factor(plt)) spc.codes <- levels(factor(spc)) taxa <- matrix(0, nrow = length(plt.codes), ncol = length(spc.codes)) row <- match(plt, plt.codes) col <- match(spc, spc.codes) for (i in 1:length(abu)) { taxa[row[i], col[i]] <- abu[i] } taxa <- data.frame(taxa) names(taxa) <- spc.codes row.names(taxa) <- plt.codes taxa } hadley wickham wrote: Hi Thierry, Thanks for the more detailed report. I think the new version of reshape will help, but I just checked and it's current a total mess and will need a lot of work before it's ready for anyone to try. Unfortunately I'm unlikely to get to it until the ggplot2 book is finished, so it might be a bit of a wait. Hadley On Tue, Oct 14, 2008 at 2:52 AM, ONKELINX, Thierry <[EMAIL PROTECTED]> wrote: Hi Hadley, Here is a more elaborate report of what I did and what when wrong. The example is not reproducible because the dataset is to large. A smaller dummy dataset is not an option as it works with smaller datasets. I'm willing to run the code again with a development version of reshape. Cheers, Thierry library(RODBC) library(reshape) Loading required package: plyr setwd("d:/wouter") Sys.info() sysname release "Windows" "XP" version nodename "build 2600, Service Pack 2" "LHPA000838" machinelogin "x86" "thierry_onkelinx" user "thierry_onkelinx" sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices datasets tcltk utils methods [8] base other attached packages: [1] reshape_0.8.1 plyr_0.1 RODBC_1.2-3svSocket_0.9-5 svIO_0.9-5 [6] R2HTML_1.59svMisc_0.9-5 svIDE_0.9-5 loaded via a namespace (and not attached): [1] tools_2.7.2 channel <- odbcConnectAccess("db1.mdb") km <- sqlQuery(channel = channel, query = "SELECT KMhokcode AS Location, TaxonFK AS Species FROM kmhok_periode2_selectie ORDER BY KMhokcode, TaxonFK", as.is = TRUE) odbcCloseAll() km$value <- 1 dim(km) [1] 1157024 3 length(unique(km$Location)) [1] 6354 length(unique(km$Species)) [1] 1381 system.time(tmp <- cast(Location ~ Species, data = km[1:1000, ], fill = 0)) user system elapsed 0.110.000.17 system.time(tmp <- cast(Location ~ Species, data = km[1:1, ], fill = 0)) user system elapsed 1.7 0.0 1.7 system.time(tmp <- cast(Location ~ Species, data = km[1:10, ], fill = 0)) user system elapsed 46.420.45 47.02 system.time(tmp <- cast(Location ~ Species, data = km, fill = 0)) Error: cannot allocate vector of size 33.5 Mb Timing stopped at: 322.95 3.43 327.4 system.time(tmp <- table(km$Location, km$Species)) user system elapsed 1.100.001.11 ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens hadley wickham Verzonden: vrijdag 10 oktober 2008 14:40 Aan: ONKELINX, Thierry CC: r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Clustering large data Thanks for your responses. The biggest problem seems to be cast() for the reshape package which could not handle the dataset. Peter's s