### Re: [R-sig-eco] fonction spc.pres in labdsv package

Hi Lisa, Sorry, I'm several days behind in monitoring the list. I'm not sure I can help without a copy of your data. Since your dataframe appears to have both strings (i,p,etc) and integers I'm not sure what you're getting. table(unlist(veg)) ignores NAs, although I think vegtrans would have warned you about NAs. If the data file is too large to send as an attachment (to me, not the list), perhaps you could show snippets. E.g. veg-read.csv(file=file.choose(),dec=,,sep=;, header=TRUE) any(is.na(veg)) veg[1:10,1:10] and then veg-read.csv(file=file.choose(),dec=,,sep=;, header=TRUE) any(is.na(newveg)) newveg[1:10,1:10] and then spc.pres- apply(newveg0,2,sum) any(is.na(spc.pres)) Dave Roberts On 04/19/2013 06:49 AM, lisa couet wrote: Hi, I have an issue concerning the fonction spc.pres in package labdsv. the message is: Erreur dans plot.window(...) : 'xlim' nécessite des valeurs finies De plus : Messages d'avis : 1: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé 2: In max(x) : aucun argument pour max ; -Inf est renvoyé 3: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé 4: In max(x) : aucun argument pour max ; -Inf est renvoyé I code: veg-read.csv(file=file.choose(),dec=,,sep=;, header=TRUE) attach(veg) ((to change my data into Braun Blanquet index )) library(labdsv) x-c(i,p,p1,p2,p3,1,12,13,14,15,21,22,23,24,25,31,32,33,34,35,41,42,43,44,45,51,52,53,54,55) y-c(0.5,3.0,3.0,3.0,3.0,15.0,15.0,15.0,15.0,15.0,37.5,37.5,37.5,37.5,37.5,62.5,62.5,62.5,62.5,62.5,85.0,85.0,85.0,85.0,85.0,97.5,97.5,97.5,97.5,97.5) newveg-vegtrans(veg,x,y) (( abondance distribution)) ab- table(unlist(newveg)) ((abondance s'affiche bien) ((Number of occurrences)) spc.pres- apply(newveg0,2,sum) plot(sort(spc.pres)) the error message is here. I have many NA because it is a file where columns are species and row elevation. So when specie is not present, nothing is in the cell. I hope one of you can help me, kind regards, lisa [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Zuur / Pinierho / Faraway

Philip, IS there an online errata,or do you just have to be smart and diligent? Thanks, Dave On 11/29/2012 07:30 AM, Dixon, Philip M [STAT] wrote: I agree with all the previous comments and second Tom's recommendations of Faraway as an 'in between' Zuur and Piniehro Bates. One thing to be careful of: While the advice in Faraway is sound, there are more than a few mistakes in his equations. Philip Dixon ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] IndVal groups Plotting an RDA with labels

Caitlin, Depending on the size of your data set one way to identify clusters is to label the dendrogram with cluster number. Lets say your Ward's result is called 'ward.hcl', and your cluster membership (perhaps returned from cutree) is in a vector called 'clustid' plot(ward.hcl,labels=clustid) will produce the dendrogram with cluster number identified on the bottom. If the number of plots is too high they overwrite and it gets hard to read, but usually you can make it out if you stretch the plot. Alternatively, size is sometimes indicative, and you can simply table(clustid) and see if cluster size is unique. Dave Roberts On 07/03/2012 07:12 AM, Caitlin Porter wrote: Dear all, I am working with a large data set to examine plant community structure on barrens habitats and have a couple of questions regarding Indval (labdsv package) on Wards Clusters (hclust function in stats package) and plotting axes of an RDA result with plot, plot.cca or esqplot functions (vegan, MASS, graphics packages) 1. I have run IndVal (package labdsv) on a Ward’s Cluster Analysis to determine indicator species for groups. I selected k=5 groups as my most meaningful and conservative (sample size, etc) classification. However, according to my average silhouette width, k=12 is the most optimal number of groups. I ran an IndVal on k=12 out of curiosity and notice some interesting patterns I would like to explore further but I am having difficulty understanding the output. The groups are labeled in the indicator species output as “group 1,2,3,4….12”. I do not understand how to determine which group is associated with which cluster (e.g.. in my Ward’s dendrogram – which is group 11?). It is somewhat obvious when k is relatively small because of the order the groups are clustered in, branch height in the dendrogram, and species frequency and abundances, however I’d like to know for larger groups. It would be ideal if I could even label the dendrogram with these groups. I’ve seen examples of these in a couple of papers with color coded boxes, but I can’t seem to figure out how to code it myself. 2. My second question relates to plotting an RDA. I have been able to run an RDA in vegan package successfully but unable to plot it in a way that I can interpret. I need to label sites, species (response matrix) and environmental variables/PCA axes (explanatory matrix). So far, I’ve only been able to label either the response matrix or the explanatory matrix in my graphs, but not all 3 sets of points. I’ve tried modifying plot function and code from Borcard et al 2011, (Numerical ecology with R), esqplot code for MASS package and plot.cca in Vegan package. I would prefer to use esqplot since I understand already how to better customize it, but I’m just looking to get any graph I can read at this point. When I use the plot function from Borcard et al. I see PC axes names only. When I use esqplot, I see species names only. I also tried plot.cca in vegan package but wasn’t able to call up a graph. This code looks like a great way to do it, but I’m not sure what I’m doing, *e.g*. what to put in for const. or what the ‘unexpected symbol’ error means. This old thread asks a similar question ( https://stat.ethz.ch/pipermail/r-help/2009-February/188282.html), but I’m not sure I understand its solution and have approximately 300 species so providing a separate name for each individually might not be feasible. This other thread asks another similar question ( http://r.789695.n4.nabble.com/RDA-Triplot-td3055474.html) but the author finds an error is generated specifying that biplot is not an appropriate method. I have included my code for question 2 below.Any help would be very much appreciated! Sincere thanks, Caitlin *#esqplot (MASS package) * library(MASS) #subset species and sites scores from the rda for first 10 RDA axes sr- scores(c1.rda, display = c(sites, species), choices = c(1,2,3,4,5,6), scaling = 2) sites.only- as.data.frame(sr$sites) srsp- as.data.frame(sr$species) # data frame with just the species in it c1.site- c1[,1] # object with just the site names from the original data set cp.m- merge(c1.site, sites.only, by=0, sort=FALSE) # merged site object with site names eqscplot(cp.m$RDA1, cp.m$RDA2, xlim=c(-1, 1), ylim=c(-1, 1), col=blue, xlab=RDA Axis 1, ylab=RDA Axis 2, cex=0.3) # defining the variables limits of plot and what the symbols look like text(cp.m$RDA1, cp.m$RDA2, labels=cp.m$x, col=black, cex=0.3) #adding names on plots text(srsp$RDA1, srsp$RDA2, labels=rownames(srsp),col= red, cex=0.3) #adding names/species # Error in text.default(cp.m$RDA1, cp.m$RDA2, labels = cp.m$x, col = black, : zero length 'labels' *#Borcard et al. 2011 plot function* plot(c1.rda, main= Triplot RDA - scaling 2 - wa scores) spe.c1- scores(c1.rda, choices=1:6, scaling=2, display=sp) arrows(0,0,spe.c1[,1], spe.c1[,4], length=0, lty=1, col=red) c1.rda.species- as.data.frame(c1

### Re: [R-sig-eco] [R] Component analysis / cluster analysis of multiple sites based on soil characteristics

Sacha, I do not fully understand your objectives, but there are several things to bear in mind in your approach below. You refer to your result object as water.pca, but it's simply a distance matrix, not a PCA. More problematic, perhaps, is that it's calculated on a matrix with very different values for the columns, e.g. temp is 30 and no2 is 0.01. In calculating Euclidean distance (the default for dist()) these scales matter a lot. If it's truly a clustering of sites based on these attributes you want you should standardize the columns before calculating dist(). Once you have a distance matrix from the standardized data you could use pam or agnes (as you have already done) but might also want to see an ordination. Given a Euclideandistance matrix I would recommend Principal Coordinates Analysis (PCO or PCoA depending on source) which I believe is available in the ecodist package you already have loaded. Dave Roberts On 01/23/2012 05:51 AM, Sacha Viquerat wrote: Hello dear list! Maybe I am demanding too much, but I am having problems finding the right way to tackle a seemingly trivial problem: We counted fish at different sites. In order to assess habitat quality at each site, we sampled temperature, pH etc. at each site, resulting in 243 observations of 8 independent variables. As we would like to identify clusters within this data set, we stumbled upon three approaches: two as realized in package cluster, using dist to create a distance matrix from our numeric variables and then pam to produce a model or agnes and then various tree methods to simplify the tree, as well as an approach via the ecodist package (using distance and pco). while results obtained through the cluster package were the same (phew!), the result from the ecodist approach did not identify clusters at all. As we are all confused and I am the one in charge of deciding which way to go, And as I am the one most confused after all, I am completely lost. Doing such an anlysis for the first time, I would be satisfied wit the pam approach identifying 2 clusters (via iterating over each k in 2:10 and picking the max average silhouette of each model). However, as there are so many different approaches out there, I am not sure if all the assumptions are met! It seems for example that pH is more or less randomly distributed. Should we keep such variables? How can I access the actual loadings of the principal axis of the pam model? Couldn't find that anywhere! In the end, there are only 33 observations in the 2nd group, which will be making the further analysis of fish counts heavily unbalanced. Any suggestions? Code snippet: water.par temp pH DO BOD COD no3 no2 po4 1 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20 2 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20 . . . 243 29.1 7.4 6.80 12.56 45.0 0.740 0.002 0.32 water.pca-dist(as.matrix(water.par)) k=best.k(water.pca,c(2,10),stand=T,trace=1) #finding the k with highest average silhouette dist clus.model-pam(water.pca,k,stand=T) clus.model$clusinfo size max_diss av_diss diameter separation 1 210 30.12712 8.445552 42.29439 27.88689 2 33 12.74630 7.725452 21.91972 27.88689 water.md - distance(water.par, euclidean) water.pco-pco(water.md) plot(water.pco$vectors[,1], water.pco$vectors[,2]) Thanks in advance and sorry for the verbosity level at max!!! ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Recommended R package for analyzing community data set? - nested design, stratified random sampling, covariates

Dear Laura, David is certainly correct that vegan would provide a wealth of tools for the analysis of your data. More generally, if you go to CRAN and browse the environmetrics Task View you will see an review of many packages suitable for specific analyses you may be interested in. Dave Roberts On 01/15/2012 12:10 PM, Laura S wrote: Dear all: May you recommend an R package for analyzing this data set? I would greatly appreciate any thoughts you can provide. I. Study goals This study examines soil crust (lichens and bryophytes) recovery and succession in fields that underwent different levels of disturbance. II. Variables Response variables of interest: soil crust cover (categorical scale - described below), species richness, species composition Explanatory variable of interest: disturbance regime (categorical variable) Environmental variables measured (covariates - mix of categorical and numerical variables): cover of mineral soil, litter, vascular plant bases, stones, or rocks, slope, aspect III. Study sampling and design Eight research areas (BR, CB, CC, JL, PC, PL, SL, TR) Within each research area subplots were assigned six disturbance treatments (NC, NS, OC, OS, SC, SS) based on disturbance history A single transect was placed randomly in the center of each subplot and sampled in twenty 20 x 20 cm plots at 1 m intervals along the transect 47 of 48 possible treatment subplots were sampled (n=6 for 7 sites, n=5 for 1 site) Sampling cover scale: Scale valueRepresentative % cover 1= 1 21-4 34-10 410-25 525-50 650-75 775-95 895-100 There were three different sampling times (spread over two years), but time of sampling was not considered as a confounding factor given the way sampling was conducted with the particular communities studied (soil crust communities). Total species positively identified: 33 taxa (species and species groups), 15 of these were in four or more of 47 subplots (n=6 x 7, n=5 x 1) Unidentified collected taxa: less than 0.5% Approximate taxa pool (species observed in entire areas, but not necessarily in sample plots): 26 lichen + 21 bryophytes = 47 taxa Thank you for your time and consideration, Laura [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] The problems on BIO-ENV procedure by [R]

Shun Tsuobi, isoMDS (and routines based on it) will not allow zero dissimilarity, which implies perfect replicates. One alternative is to remove one of the replicates, but that may have unsatisfactory effect on further analyses. Alternatively, if you are sure you want to keep the duplicate plots you can change the zero to a small value d[d==0] - 0.0001 and run the isoMDS on the resulting revised dissimilarity matrix. Undoubtedly someone from the vegan group will respond to your second question. Dave Roberts On 12/05/2011 01:28 AM, 坪井 隼 wrote: Dear Madam / Sir, I have two questions for the use of the “R” program for the ecological research. I am studying the relationship between the community structures of environmental microbes and some environmental conditions. For this objective, I have known that the BIOENV procedure, which was developed by Clarke Ainsworth (1993), is available on the “R” software. Fist question; I attempted the use of the procedure to analyze the relationship between the variation of the microbial community structures and the environmental factors. However, I can not analyze the relationship based on isoMDS function. The isoMDS was inacceptable for my dataset. The command for the BIOENV procedure, which I programmed, and the error massage I gained was as follow; library(MASS) library(vegan) communitydat-read.table(C:/Documents and Settings/shuntsuboi/desktop /bray.txt, head- er=T) environdat-read.csv(C:/Documents and Settings shuntsuboi/desktop/ev. csv,header=T) env-environdat[,c(variablesA,variablesB.,variablesC,variablesD ,variablesE)] d- vegdist(communitydat, bray) isoMDS(d) error isoMDS(d) : zero or negative distance between objects 1 and 2 As mentioned above, I can not run the program because the error, which is “isoMDS(d) : zero or negative distance between objects 1 and 2”, occurred. What are the ways to solve this problem ? On isoMDS function, what are the ways that the zero distance of “Bray-Curtis distance” is acceptable in the function ? Second question; Based on the command as above, I ran the metaMDS function. However, although I could automatically describe the two dimensional ordination plot figure, I could not gain the X and Y value of the respective plots. Then, the error massage was shown as follow; “In ordiplot(x, choices = choices, type = type, display = display, : Species scores not available” What are the ways to solve this problem ? Sincerely yours, Shun Tsuboi ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Multivariate ANOVA/repeated measures

On 10/07/2011 08:51 AM, Dr N.A. Cutler wrote: Dear All, I have a query about multivariate analysis of community data. In my experiment, 24 microbial communities in different locations were sampled using Sampling Technique 1 (ST1). A site X species matrix was then derived by molecular analysis. The same 24 locations were then sampled again using a different sampling technique (ST2) and a second site X species matrix was derived. It is assumed that community structure remains intact after sampling by Technique 1 i.e. the two techniques can sample from the same pool of organisms. I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. My research question is: does Technique 1 produce a similar signal to Technique 2? Or do the different techniques give significantly different pictures of community structure? The null hypotheses is that there is no significant difference between the two sampling techniques i.e. they both capture community structure with the same degree of accuracy. It occurred to be that I could use a multivariate ANOVA technique (e.g. Adonis) to distinguish between the results of the two sampling exercises, using sampling technique as a factor. But I am not sure how to deal with the obvious correlation between sample pairs. Should this situation be addressed as a repeated measures experiment with two time steps? If so, what is the best technique to use (a mixed model, perhaps?) Any advice would be gratefully received. Best wishes, Nick Cutler ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology Nick, I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try actual - sum((ST1-ST2)^2) and then permute one of the two matrices numerous times res - rep(NA,999) for (i in 1:999) { res[i] - sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2) } final - (sum(res = actual) + 1)/1000 and see what fraction of the permuted matrices are as similar. Hopefully Gavin will weigh in with a better randomization. If you do go with a multivariate approach I might try a procrustes analysis of PCO ordinations. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Multivariate ANOVA/repeated measures

On 10/10/2011 02:15 PM, Gavin Simpson wrote: On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote: On Mon, 10 Oct 2011, Dave Roberts wrote: I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try I did not read the original message, so I hope you'll allow me to join the thread. My recommendation is to use univariate tree models, particularly a classification tree (for ordinal explanatory variables; i.e., ST1 and ST2). But the response here is *multivariate* - of course, one could use Glen De'Ath's multivariate regression trees (despite the name it is really a constrained clustering/classification) - but I think there are better ways of solving this particular problem. And unless one has many 100s of observations, the model will need some sort of variance reduction applied (via bagging, or some such) as the one fitted model is potentially highly unstable. G This is fully, carefully, and non-technically explained in Chapter 9 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing Ecological Data. For that matter, I highly recommend reading the whole book. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology It would be fairly simple to boil down to a univariate question. You could do something as simple as a paired t-test of plot-level species richness or the number of individuals sampled (to compare sampling efficiency), but I still don't see an independent and a dependent variable. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] help with analysis

Hi Kátia, Sorry to be so late; just back in the office. Pedro is correct that adonis will help establish statistical significance to potential differences, but I think NMDS could still be very helpful. One approach would be to code the sites by glyph (e.g. site 1 = circle, site 2 = triangle, etc.) and then to draw arrows from the first date to the second, second to third, etc, for each site. In labdsv you could do this if your nmds is called nmds.object plot(nmds.object,type=n') to draw the axes but not the points, and then use points(nmds.object,site==1,pch=1) points(nmds.object,site==2,pch=2) etc. Then, depending on how the sites are sorted, the arrows could be drawn arrows(nmds.object$points[1,1],nmds.object$points[1,2],nmds.object$points[2,1], nmds.object$points[2,2]) to draw an arrow form the first point to the second. YOu might need to mess with the parameters of arrows() to get the arrow sizes you want. In vegan you could do nmds.plot - plot(metaMDS(dissimilarity or taxon matrix),type='n') points(nmds.plot$sites[site==1]) points(nmds.plot$sites[site==2],pch=2) etc arrows(nmds.plot$sites[1,1],nmds.plot$sites[1,2], nmds.plot$sites[2,1],nmds.plot$sites[2,2]) etc. Hope that helps, Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On 08/08/2011 08:20 AM, Kátia Emidio wrote: Dear all, I have a data from inventory of plants in 6 fragmented forests of the same size,randomly selected, measured during 3 periods of time, and I'd like to know about changes in species composition over time. It is a good idea to use NMDS for each period of time and then to compare the changes in the ordination diagram?? What kind of analysis could be better?? Cheers, Katia [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] The final result of TWINSPAN

Dear Zoltan, Thanks for the note. The R function I wrote does in fact follow the Roleček et al protocol, and that's partly what motivated the idea to write it up. Lubomír Tichý, Petr Smilauer, and Laco Mucina have all contributed information in the development, but I've still been stymied by the lack of solid information on the actual algorithm. I think it is quite possible to write a function that operates on the principle of TWINSPAN, following Roleček et al, but writing a function that exactly matches the output from the commercial package may prove to be too much trouble. Thanks, Dave Zoltan Botta-Dukat wrote: Dear Dave, This modified version of TWINSPAN may be interesting for you when you compare methods: Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity Journal of Vegetation Science 20(4): 596–602 http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract Zoltan 2011.04.26. 23:40 keltezéssel, Dave Roberts írta: Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] The final result of TWINSPAN

Dear List, Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on. The problem is that I based the code on Hill, Bunch Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me. So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up. -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On 04/14/2011 01:57 AM, Jari Oksanen wrote: On 14/04/11 10:37 AM, Yong Zhang2010202...@njau.edu.cn wrote: Dear all, I conducted the two-way indicator species analysis using TWINSPAN program, and following is the final result: 0111 00011011 011000111 01001001 I have to certify my analysis, I want to classify the above 24 sampling sites into 3 major groups based on 7 biotic metrics. The name of my 24 samples could be site1 to site24, from the left to the right, and I set the cut levels 0, 2, 5, 10, 20, the maximum level of divisions: 6, and maximum group size for division:3 . Now, my question is whether my setting is correct? And how should I classify these sites into 3 groups accoding to this final result? Dear Yong Zhang, This is not an R issue, because there is no TWINSPAN in R. However, the answer to your question is that strictly speaking you cannot group your data into three major groups with TWINSPAN. TWINSPAN is a bisection method so that first division gives you two groups, and second splits each of these into two groups so that the next choice is to have four groups. However, in this case one of the groups was so small (3 plots were split off from other in the first division, and then these were split into groups of 2 plots and 1 plot) that you probably can ignore the second division of the small group. If your goal was as vague as wanting to classify 24 sites into 3 major groups you could do better than use TWINSPAN: what's the problem with proper classification methods in R? Moreover, have you checked that your biotic metrics suit to the pseudospecies cut level concept of TWINSPAN? Cheers, jari oksanen ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Dissimilarity ranking

Burak, I think your question is simpler than the suggestions of NMDS. One approach, let's say you dissimilarity matrix is called demodis demodis2 - as.matrix(demodis) # make a full matrix copy is.na(diag(demodis2)) - TRUE# ignore the dissimilarity of a # plot to itself apply(as.matrix(demodis),1,mean) # calcualte the mean dissimilarity # of every plot to the others This is all done as part of function disana() in package labdsv along with some other simple dissimilarity analyses. Dave Roberts On 11/23/2010 01:09 PM, Pekin, Burak K wrote: Hello, I want to rank the dissimilarity of sites based on their species composition. For example, I would like to be able to say that site A is less similar in composition to the other sites than site B is similar to the other sites. I could do a cluster analysis and look at which sites are less closely clustered. It would be even better if I could come up with a quantitative scale rather than a relative ranking that would give a value for each site based on its relative dissimilarity to the rest of the sites. So site A might receive a 90 out of 100, whereas site B and C might receive a 60 and a 50 indicating the rank as well as 'relative quantity' of dissimilarity for each site. Thanks, Burak -- Burak K. Pekin, PhD Postdoctoral Research Associate Purdue University [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] multivariate smoothing and gradient estimation

Chris, One (sub-optimal) solution would be to fit GAMS and then have the gam.predict estimate values immediately near your data points which you could use to calculate a local gradient. If the GAM is reasonably smooth, I would think you could get estimates that were reasonable. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 Chris Martin wrote: Dear list members, I am a looking for a function that can calculate a surface from at least four predictor variables and one response variable. I would then like to calculate the first derivative of a specific point on this surface. I have looked at many packages for nonparametric smoothing and kernel density estimation but have been unable to find any that fulfill both these criteria. For example, loess can handle multivariate data, but I do not how to extract the derivative from the resulting fit? Many smoothing splines offer predict functions to extract the derivative, but these functions can only handle univariate data (e.g. smooth.spline). Ideally, I would like to use local estimates of the surface (i.e. loess). I would appreciate any suitable functions or advice on where to look for functions that fulfill both these criteria. Thank you very much for your time. best wishes, Chris Martin Population Biology Graduate Group '12 University of California, Davis [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology - ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Teaching materials

Volker, You're welcome to take look at my site: http://ecology.msu.montana.edu/labdsv/R It's primarily for multivariate analysis in community ecology, bit does also have some stuff on GLMs and GAMS in an ecological context. You might also visit with Hank Stevens at Miami as he's just next door to you. Dave - David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 On Wed, 2009-08-26 at 16:39 -0400, Volker Bahn wrote: Hi all, I'll be teaching a 400/600 level bio/ecostatistics class using R this fall (using Introductory Statistics with R as text book). I was wondering if anyone here had any teaching material (lecture slides, exercises, homework, projects, code for teaching etc) that they would like to share? Pointers to material on the web would also be welcome. I found a few courses with course material on the web but they were typically programming oriented and in health sciences rather than ecology. If you'd rather reply off list, I'll summarize the responses for the list. Thanks, Volker Biological Sciences Wright State University ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

### Re: [R-sig-eco] Clustering large data

Thierry and Hadley, Sorry to be late coming into this (I forgot I subscribed to sig-eco). package labdsv has a function called matrify() which takes a three column data.frame (sample,taxa,abundance) and creates a full (sparse) matrix representation. I've never tried it on a data set as large as yours, and I'm curious if it would work. It's pure R, but if worst comes to worst I used to have a FORTRAN version that would probably work. Please give matrify a try and let me know. Dave R. matrify - function (data) { if (ncol(data) != 3) stop(data frame must have three column format) plt - data[, 1] spc - data[, 2] abu - data[, 3] plt.codes - levels(factor(plt)) spc.codes - levels(factor(spc)) taxa - matrix(0, nrow = length(plt.codes), ncol = length(spc.codes)) row - match(plt, plt.codes) col - match(spc, spc.codes) for (i in 1:length(abu)) { taxa[row[i], col[i]] - abu[i] } taxa - data.frame(taxa) names(taxa) - spc.codes row.names(taxa) - plt.codes taxa } hadley wickham wrote: Hi Thierry, Thanks for the more detailed report. I think the new version of reshape will help, but I just checked and it's current a total mess and will need a lot of work before it's ready for anyone to try. Unfortunately I'm unlikely to get to it until the ggplot2 book is finished, so it might be a bit of a wait. Hadley On Tue, Oct 14, 2008 at 2:52 AM, ONKELINX, Thierry [EMAIL PROTECTED] wrote: Hi Hadley, Here is a more elaborate report of what I did and what when wrong. The example is not reproducible because the dataset is to large. A smaller dummy dataset is not an option as it works with smaller datasets. I'm willing to run the code again with a development version of reshape. Cheers, Thierry library(RODBC) library(reshape) Loading required package: plyr setwd(d:/wouter) Sys.info() sysname release Windows XP version nodename build 2600, Service Pack 2 LHPA000838 machinelogin x86 thierry_onkelinx user thierry_onkelinx sessionInfo() R version 2.7.2 (2008-08-25) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252 attached base packages: [1] stats graphics grDevices datasets tcltk utils methods [8] base other attached packages: [1] reshape_0.8.1 plyr_0.1 RODBC_1.2-3svSocket_0.9-5 svIO_0.9-5 [6] R2HTML_1.59svMisc_0.9-5 svIDE_0.9-5 loaded via a namespace (and not attached): [1] tools_2.7.2 channel - odbcConnectAccess(db1.mdb) km - sqlQuery(channel = channel, query = SELECT KMhokcode AS Location, TaxonFK AS Species FROM kmhok_periode2_selectie ORDER BY KMhokcode, TaxonFK, as.is = TRUE) odbcCloseAll() km$value - 1 dim(km) [1] 1157024 3 length(unique(km$Location)) [1] 6354 length(unique(km$Species)) [1] 1381 system.time(tmp - cast(Location ~ Species, data = km[1:1000, ], fill = 0)) user system elapsed 0.110.000.17 system.time(tmp - cast(Location ~ Species, data = km[1:1, ], fill = 0)) user system elapsed 1.7 0.0 1.7 system.time(tmp - cast(Location ~ Species, data = km[1:10, ], fill = 0)) user system elapsed 46.420.45 47.02 system.time(tmp - cast(Location ~ Species, data = km, fill = 0)) Error: cannot allocate vector of size 33.5 Mb Timing stopped at: 322.95 3.43 327.4 system.time(tmp - table(km$Location, km$Species)) user system elapsed 1.100.001.11 ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens hadley wickham Verzonden: vrijdag 10 oktober 2008 14:40 Aan: ONKELINX, Thierry CC: r-sig-ecology@r-project.org Onderwerp: Re: [R-sig-eco] Clustering large data Thanks for your responses. The biggest problem seems to be cast() for the reshape package which could not handle the dataset. Peter's solution using the mefa package worked fine.