Re: [R-sig-eco] fonction spc.pres in labdsv package

2013-04-26 Thread Dave Roberts

Hi Lisa,

   Sorry, I'm several days behind in monitoring the list.  I'm not sure 
I can help without a copy of your data.  Since your dataframe appears to 
have both strings (i,p,etc) and integers I'm not sure what you're 
getting.  table(unlist(veg)) ignores NAs, although I think vegtrans 
would have warned you about NAs.


   If the data file is too large to send as an attachment (to me, not 
the list), perhaps you could show snippets.  E.g.


 veg-read.csv(file=file.choose(),dec=,,sep=;, header=TRUE)
 any(is.na(veg))
 veg[1:10,1:10]

and then

 veg-read.csv(file=file.choose(),dec=,,sep=;, header=TRUE)
 any(is.na(newveg))
 newveg[1:10,1:10]

and then

 spc.pres- apply(newveg0,2,sum)
 any(is.na(spc.pres))

Dave Roberts


On 04/19/2013 06:49 AM, lisa couet wrote:

Hi,

I have an issue concerning the fonction spc.pres in package labdsv.
the message is:

Erreur dans plot.window(...) : 'xlim' nécessite des valeurs finies
De plus : Messages d'avis :
1: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé
2: In max(x) : aucun argument pour max ; -Inf est renvoyé
3: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé
4: In max(x) : aucun argument pour max ; -Inf est renvoyé

I code:

veg-read.csv(file=file.choose(),dec=,,sep=;, header=TRUE)
attach(veg)

((to change my data into  Braun Blanquet index ))

library(labdsv)
x-c(i,p,p1,p2,p3,1,12,13,14,15,21,22,23,24,25,31,32,33,34,35,41,42,43,44,45,51,52,53,54,55)
y-c(0.5,3.0,3.0,3.0,3.0,15.0,15.0,15.0,15.0,15.0,37.5,37.5,37.5,37.5,37.5,62.5,62.5,62.5,62.5,62.5,85.0,85.0,85.0,85.0,85.0,97.5,97.5,97.5,97.5,97.5)
newveg-vegtrans(veg,x,y)

(( abondance distribution))

ab- table(unlist(newveg))

((abondance s'affiche bien)

((Number of occurrences))

spc.pres- apply(newveg0,2,sum)
plot(sort(spc.pres))

the error message is here.

I have many NA because it is a file where columns are species and row 
elevation. So when specie is not present, nothing is in the cell.

I hope one of you can help me,


kind regards,
lisa

[[alternative HTML version deleted]]




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Zuur / Pinierho / Faraway

2012-11-30 Thread Dave Roberts

Philip,

   IS there an online errata,or do you just have to be smart and diligent?

Thanks, Dave

On 11/29/2012 07:30 AM, Dixon, Philip M [STAT] wrote:

I agree with all the previous comments and second Tom's recommendations of Faraway 
as an 'in between' Zuur and Piniehro  Bates.  One thing to be careful of: 
While the advice in Faraway is sound, there are more than a few mistakes in his 
equations.

Philip Dixon

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] IndVal groups Plotting an RDA with labels

2012-07-03 Thread Dave Roberts

Caitlin,

Depending on the size of your data set one way to identify clusters 
is to label the dendrogram with cluster number.  Lets say your Ward's 
result is called 'ward.hcl', and your cluster membership (perhaps 
returned from cutree) is in a vector called 'clustid'


 plot(ward.hcl,labels=clustid)

will produce the dendrogram with cluster number identified on the 
bottom.  If the number of plots is too high they overwrite and it gets 
hard to read, but usually you can make it out if you stretch the plot.


   Alternatively, size is sometimes indicative, and you can simply

 table(clustid)

and see if cluster size is unique.

Dave Roberts

On 07/03/2012 07:12 AM, Caitlin Porter wrote:

Dear all,


I am working with a large data set to examine plant community structure on
barrens habitats and have a couple of questions regarding Indval (labdsv
package) on Wards Clusters (hclust function in stats package) and plotting
axes of an RDA result with plot, plot.cca or esqplot functions (vegan,
MASS, graphics packages)

1.  I have run IndVal (package labdsv) on a Ward’s Cluster Analysis to
determine indicator species for groups. I selected k=5 groups as my most
meaningful and conservative (sample size, etc) classification. However,
according to my average silhouette width, k=12 is the most optimal number
of groups.  I ran an IndVal on k=12 out of curiosity and notice some
interesting patterns I would like to explore further but I am having
difficulty understanding the output. The groups are labeled in the
indicator species output as “group 1,2,3,4….12”. I do not understand how to
determine which group is associated with which cluster (e.g.. in my Ward’s
dendrogram – which is group 11?). It is somewhat obvious when k is
relatively small because of the order the groups are clustered in, branch
height in the dendrogram, and species frequency and abundances, however I’d
like to know for larger groups. It would be ideal if I could even label the
dendrogram with these groups. I’ve seen examples of these in a couple of
papers with color coded boxes, but I can’t  seem to figure out how to code
it myself.

  2.  My second question relates to plotting an RDA. I have been able to
run an RDA in vegan package successfully but unable to plot it in a way
that I can interpret. I need to label sites, species (response matrix) and
environmental variables/PCA axes (explanatory matrix).

  So far, I’ve only been able to label either the response matrix or the
explanatory matrix in my graphs, but not all 3 sets of points. I’ve tried
modifying plot function and code from Borcard et al 2011, (Numerical
ecology with R), esqplot code for MASS package and plot.cca in Vegan
package.  I would prefer to use esqplot since I understand  already how to
better customize it, but I’m just looking to get any graph I can read at
this point.  When I use the plot function from Borcard et al. I see PC axes
names only. When I use esqplot, I see species names only.  I also tried
plot.cca in vegan package but wasn’t able to call up a graph. This code
looks like a great way to do it, but I’m not sure what I’m doing, *e.g*.
what to put in for const. or what the ‘unexpected symbol’ error means.

  This old thread asks a similar question (
https://stat.ethz.ch/pipermail/r-help/2009-February/188282.html), but I’m
not sure I understand its solution and have approximately 300 species so
providing a separate name for each individually might not be feasible. This
other thread asks another similar question (
http://r.789695.n4.nabble.com/RDA-Triplot-td3055474.html) but the author
finds an error is generated specifying that biplot is not an appropriate
method.

I have included my code for question 2 below.Any help would be very much
appreciated!


Sincere thanks,


Caitlin


*#esqplot (MASS package) *

library(MASS)

#subset species and sites scores from the rda for first 10 RDA axes

sr- scores(c1.rda, display = c(sites, species), choices =
c(1,2,3,4,5,6), scaling = 2)

sites.only- as.data.frame(sr$sites)

srsp- as.data.frame(sr$species) # data frame with just the species in it

c1.site- c1[,1] # object with just the site names from the original data
set

cp.m- merge(c1.site, sites.only, by=0, sort=FALSE) # merged site object
with site names

eqscplot(cp.m$RDA1, cp.m$RDA2, xlim=c(-1, 1), ylim=c(-1, 1), col=blue,
xlab=RDA Axis 1, ylab=RDA Axis 2, cex=0.3) # defining the variables
limits of plot and what the symbols look like

text(cp.m$RDA1, cp.m$RDA2, labels=cp.m$x, col=black, cex=0.3) #adding
names on plots

text(srsp$RDA1, srsp$RDA2, labels=rownames(srsp),col= red, cex=0.3) #adding
names/species

# Error in text.default(cp.m$RDA1, cp.m$RDA2, labels = cp.m$x, col =
black,  :   zero length 'labels'



*#Borcard et al. 2011 plot function*

plot(c1.rda, main= Triplot RDA - scaling 2 - wa scores)

spe.c1- scores(c1.rda, choices=1:6, scaling=2, display=sp)

arrows(0,0,spe.c1[,1], spe.c1[,4], length=0, lty=1, col=red)



c1.rda.species- as.data.frame(c1

Re: [R-sig-eco] [R] Component analysis / cluster analysis of multiple sites based on soil characteristics

2012-01-23 Thread Dave Roberts

Sacha,

   I do not fully understand your objectives, but there are several 
things to bear in mind in your approach below.   You refer to your 
result object as water.pca, but it's simply a distance matrix, not a 
PCA.  More problematic, perhaps, is that it's calculated on a matrix 
with very different values for the columns, e.g. temp is  30 and no2 is 
 0.01.  In calculating Euclidean distance (the default for dist()) 
these scales matter a lot.  If it's truly a clustering of sites based on 
these attributes you want you should standardize the columns before 
calculating dist().


   Once you have a distance matrix from the standardized data you could 
use pam or agnes (as you have already done) but might also want to see 
an ordination.  Given a Euclideandistance matrix I would recommend 
Principal Coordinates Analysis (PCO or PCoA depending on source) which I 
believe is available in the ecodist package you already have loaded.


Dave Roberts

On 01/23/2012 05:51 AM, Sacha Viquerat wrote:

Hello dear list!
Maybe I am demanding too much, but I am having problems finding the
right way to tackle a seemingly trivial problem:

We counted fish at different sites. In order to assess habitat quality
at each site, we sampled temperature, pH etc. at each site, resulting in
243 observations of 8 independent variables. As we would like to
identify clusters within this data set, we stumbled upon three
approaches: two as realized in package cluster, using dist to create a
distance matrix from our numeric variables and then pam to produce a
model or agnes and then various tree methods to simplify the tree, as
well as an approach via the ecodist package (using distance and pco).
while results obtained through the cluster package were the same
(phew!), the result from the ecodist approach did not identify clusters
at all. As we are all confused and I am the one in charge of deciding
which way to go, And as I am the one most confused after all, I am
completely lost. Doing such an anlysis for the first time, I would be
satisfied wit the pam approach identifying 2 clusters (via iterating
over each k in 2:10 and picking the max average silhouette of each
model). However, as there are so many different approaches out there, I
am not sure if all the assumptions are met! It seems for example that pH
is more or less randomly distributed. Should we keep such variables? How
can I access the actual loadings of the principal axis of the pam model?
Couldn't find that anywhere! In the end, there are only 33 observations
in the 2nd group, which will be making the further analysis of fish
counts heavily unbalanced. Any suggestions?

Code snippet:

water.par

temp pH DO BOD COD no3 no2 po4
1 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20
2 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20
.
.
.
243 29.1 7.4 6.80 12.56 45.0 0.740 0.002 0.32

water.pca-dist(as.matrix(water.par))
k=best.k(water.pca,c(2,10),stand=T,trace=1) #finding the k with highest
average silhouette dist
clus.model-pam(water.pca,k,stand=T)

clus.model$clusinfo

size max_diss av_diss diameter separation
1 210 30.12712 8.445552 42.29439 27.88689
2 33 12.74630 7.725452 21.91972 27.88689

water.md - distance(water.par, euclidean)
water.pco-pco(water.md)
plot(water.pco$vectors[,1], water.pco$vectors[,2])

Thanks in advance and sorry for the verbosity level at max!!!

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Recommended R package for analyzing community data set? - nested design, stratified random sampling, covariates

2012-01-17 Thread Dave Roberts

Dear Laura,

   David is certainly correct that vegan would provide a wealth of 
tools for the analysis of your data.  More generally, if you go to CRAN 
and browse the environmetrics  Task View you will see an review of 
many packages suitable for specific analyses you may be interested in.


Dave Roberts

On 01/15/2012 12:10 PM, Laura S wrote:

Dear all:

May you recommend an R package for analyzing this data set? I would greatly 
appreciate any thoughts you can provide.


I. Study goals

This study examines soil crust (lichens and bryophytes) recovery and succession 
in fields that underwent different levels of disturbance.


II. Variables
Response variables of interest: soil crust cover (categorical scale - described 
below), species richness, species composition
Explanatory variable of interest: disturbance regime (categorical variable)
Environmental variables measured (covariates - mix of categorical and numerical 
variables): cover of mineral soil, litter, vascular plant bases, stones, or 
rocks,
slope, aspect

III. Study sampling and design
Eight research areas (BR, CB, CC, JL, PC, PL, SL, TR)
Within each research area subplots were assigned six disturbance treatments 
(NC, NS, OC, OS, SC, SS) based on disturbance history
A single transect was placed randomly in the center of each subplot and sampled 
in twenty 20 x 20 cm plots at 1 m intervals along the transect
47 of 48 possible treatment subplots were sampled (n=6 for 7 sites, n=5 for 1 
site)


Sampling cover scale:

Scale valueRepresentative % cover

1= 1
21-4
34-10
410-25
525-50
650-75
775-95
895-100


There were three different sampling times (spread over two years), but time of 
sampling was not considered as a confounding factor given the way sampling was 
conducted with the particular communities studied (soil crust communities).

Total species positively identified: 33 taxa (species and species groups), 15 
of these were in four or more
of 47 subplots (n=6 x 7, n=5 x 1)
Unidentified collected taxa: less than 0.5%
Approximate taxa pool (species observed in entire areas, but not necessarily in 
sample plots): 26 lichen + 21 bryophytes
= 47 taxa

Thank you for your time and consideration,
Laura
[[alternative HTML version deleted]]




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The problems on BIO-ENV procedure by [R]

2011-12-05 Thread Dave Roberts

Shun Tsuobi,

   isoMDS (and routines based on it) will not allow zero dissimilarity, 
which implies perfect replicates.  One alternative is to remove one of 
the replicates, but that may have unsatisfactory effect on further 
analyses.  Alternatively, if you are sure you want to keep the duplicate 
plots you can change the zero to a small value


d[d==0] - 0.0001

and run the isoMDS on the resulting revised dissimilarity matrix.

Undoubtedly someone from the vegan group will respond to your second 
question.


Dave Roberts

On 12/05/2011 01:28 AM, 坪井 隼 wrote:

Dear Madam / Sir,

I have two questions for the use of the “R” program for the ecological
research. I am studying the relationship between the community
structures of environmental microbes and some environmental conditions.
For this objective, I have known that the BIOENV procedure, which was
developed by Clarke  Ainsworth (1993), is available on the “R” software.


Fist question;
I attempted the use of the procedure to analyze the relationship between
the variation of the microbial community structures and the
environmental factors. However, I can not analyze the relationship based
on isoMDS function. The isoMDS was inacceptable for my dataset. The
command for the BIOENV procedure, which I programmed, and the error
massage I gained was as follow;


library(MASS)
library(vegan)
communitydat-read.table(C:/Documents and Settings/shuntsuboi/desktop

/bray.txt, head- er=T)

environdat-read.csv(C:/Documents and Settings shuntsuboi/desktop/ev.

csv,header=T)

env-environdat[,c(variablesA,variablesB.,variablesC,variablesD

,variablesE)]

d- vegdist(communitydat, bray)
isoMDS(d)

error   isoMDS(d) : zero or negative distance between objects 1 and 2

As mentioned above, I can not run the program because the error, which
is “isoMDS(d) : zero or negative distance between objects 1 and 2”,
occurred. What are the ways to solve this problem ? On isoMDS function,
what are the ways that the zero distance of “Bray-Curtis distance” is
acceptable in the function ?

Second question;
Based on the command as above, I ran the metaMDS function. However,
although I could automatically describe the two dimensional ordination
plot figure, I could not gain the X and Y value of the respective plots.
Then, the error massage was shown as follow;
“In ordiplot(x, choices = choices, type = type, display = display,  :
Species scores not available”

What are the ways to solve this problem ?

Sincerely yours,
Shun Tsuboi

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Dave Roberts



On 10/07/2011 08:51 AM, Dr N.A. Cutler wrote:

Dear All,

I have a query about multivariate analysis of community data.

In my experiment, 24 microbial communities in different locations were
sampled using Sampling Technique 1 (ST1). A site X species matrix was
then derived by molecular analysis.

The same 24 locations were then sampled again using a different sampling
technique (ST2) and a second site X species matrix was derived. It is
assumed that community structure remains intact after sampling by
Technique 1 i.e. the two techniques can sample from the same pool of
organisms.

I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques. My research
question is: does Technique 1 produce a similar signal to Technique 2?
Or do the different techniques give significantly different pictures of
community structure? The null hypotheses is that there is no significant
difference between the two sampling techniques i.e. they both capture
community structure with the same degree of accuracy.

It occurred to be that I could use a multivariate ANOVA technique (e.g.
Adonis) to distinguish between the results of the two sampling
exercises, using sampling technique as a factor. But I am not sure how
to deal with the obvious correlation between sample pairs. Should this
situation be addressed as a repeated measures experiment with two time
steps? If so, what is the best technique to use (a mixed model, perhaps?)

Any advice would be gratefully received.

Best wishes,

Nick Cutler

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Nick,

   I would try something pretty direct.  Any appeal to differences in 
dissimilarities confounds the effects with the particular 
dissimilarity/distance matrix you use.  Assuming the samples and species 
are in the same order, and that the data.frames are the same size, you 
might try


 actual - sum((ST1-ST2)^2)

and then permute one of the two matrices numerous times

res - rep(NA,999)
for (i in 1:999) {
 res[i] - sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2)
}
final - (sum(res = actual) + 1)/1000

and see what fraction of the permuted matrices are as similar.

Hopefully Gavin will weigh in with a better randomization.

   If you do go with a multivariate approach I might try a procrustes 
analysis of PCO ordinations.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Dave Roberts



On 10/10/2011 02:15 PM, Gavin Simpson wrote:

On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote:

On Mon, 10 Oct 2011, Dave Roberts wrote:


I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques.



   I would try something pretty direct. Any appeal to differences in
dissimilarities confounds the effects with the particular
dissimilarity/distance matrix you use. Assuming the samples and species
are in the same order, and that the data.frames are the same size, you
might try


I did not read the original message, so I hope you'll allow me to join the
thread. My recommendation is to use univariate tree models, particularly a
classification tree (for ordinal explanatory variables; i.e., ST1 and ST2).


But the response here is *multivariate* - of course, one could use Glen
De'Ath's multivariate regression trees (despite the name it is really a
constrained clustering/classification) - but I think there are better
ways of solving this particular problem. And unless one has many 100s of
observations, the model will need some sort of variance reduction
applied (via bagging, or some such) as the one fitted model is
potentially highly unstable.

G


This is fully, carefully, and non-technically explained in Chapter 9
(particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing
Ecological Data. For that matter, I highly recommend reading the whole
book.

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




It would be fairly simple to boil down to a univariate question.  You 
could do something as simple as a paired t-test of plot-level species 
richness or the number of individuals sampled (to compare sampling 
efficiency), but I still don't see an independent and a dependent variable.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] help with analysis

2011-08-19 Thread Dave Roberts

Hi Kátia,

   Sorry to be so late; just back in the office.

   Pedro is correct that adonis will help establish statistical 
significance to potential differences, but I think NMDS could still be 
very helpful.  One approach would be to code the sites by glyph (e.g. 
site 1 = circle, site 2 = triangle, etc.) and then to draw arrows from 
the first date to the second, second to third, etc, for each site.


   In labdsv you could do this if your nmds is called nmds.object

 plot(nmds.object,type=n')

to draw the axes but not the points, and then use

 points(nmds.object,site==1,pch=1)
 points(nmds.object,site==2,pch=2)

etc.  Then, depending on how the sites are sorted, the arrows could be drawn

 
arrows(nmds.object$points[1,1],nmds.object$points[1,2],nmds.object$points[2,1],

  nmds.object$points[2,2])

to draw an arrow form the first point to the second.  YOu might need to 
mess with the parameters of arrows() to get the arrow sizes you want.


   In vegan you could do

 nmds.plot - plot(metaMDS(dissimilarity or taxon matrix),type='n')
 points(nmds.plot$sites[site==1])
 points(nmds.plot$sites[site==2],pch=2)
etc
 arrows(nmds.plot$sites[1,1],nmds.plot$sites[1,2],
   nmds.plot$sites[2,1],nmds.plot$sites[2,2])

etc.

Hope that helps, Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

On 08/08/2011 08:20 AM, Kátia Emidio wrote:

Dear all,
I have a data from inventory of plants in 6 fragmented forests of the same
size,randomly selected, measured during 3 periods of time, and I'd like to
know about changes in species composition over time. It is a good idea to
use NMDS for each period of time
and then to compare the changes in the ordination diagram?? What kind of
analysis could be better??
Cheers,
Katia

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-27 Thread Dave Roberts

Dear Zoltan,

   Thanks for the note.  The R function I wrote does in fact follow the 
Roleček et al protocol, and that's partly what motivated the idea to 
write it up. Lubomír Tichý, Petr Smilauer, and Laco Mucina have all 
contributed information in the development, but I've still been stymied 
by the lack of solid information on the actual algorithm.


   I think it is quite possible to write a function that operates on 
the principle of TWINSPAN, following Roleček et al, but writing a 
function that exactly matches the output from the commercial package may 
prove to be too much trouble.


Thanks, Dave

Zoltan Botta-Dukat wrote:

Dear Dave,

This modified version of TWINSPAN may be interesting for you when you 
compare methods:


Modified TWINSPAN classification in which the hierarchy respects cluster 
heterogeneity


Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified 
TWINSPAN classification in which the hierarchy respects cluster 
heterogeneity Journal of Vegetation Science 20(4): 596–602
http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract 



Zoltan

2011.04.26. 23:40 keltezéssel, Dave Roberts írta:

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up an R 
version of TWINSPAN. It's far from a polished package at this point, 
but the code does run. One of the interesting features is that you can 
partition a PCO or NMDS in addition to the traditional CA. To be 
clear, I am not a TWINSPAN fan either, but I wanted it for a methods 
paper I was working on.


The problem is that I based the code on Hill, Bunch  Shaw (1975,
J of Ecol 63:597-613) which is what I had available. Apparently the 
algorithm in the commercial TWINSPAN is significantly modified from 
the original, but I couldn't find a description of the actual 
algorithm anywhere in the literature. It is probably described in the 
User Manual of the software, but I was not sufficiently motivated to 
chase down a copy. I do have a copy of the FORTRAN code, but it was 
apparently written in FORTRAN II, and is basically inscrutable, even 
to an old FORTRAN dog like me.


So, if somebody has a clear description of the actual algorithm (and I 
think it is disturbing that I could not find one), it would be 
possible to code it up in native R. The alternative, to write a 
wrapper for the original FORTRAN code is not a trivial task. I gave it 
a couple of days and gave up.




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-26 Thread Dave Roberts

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up 
an R version of TWINSPAN.  It's far from a polished package at this 
point, but the code does run.  One of the interesting features is that 
you can partition a PCO or NMDS in addition to the traditional CA. To be 
clear, I am not a TWINSPAN fan either, but I wanted it for a methods 
paper I was working on.


The problem is that I based the code on Hill, Bunch  Shaw (1975,
J of  Ecol  63:597-613) which is what I had available.  Apparently the 
algorithm in the commercial TWINSPAN is significantly modified from the 
original, but I couldn't find a description of the actual algorithm 
anywhere in the literature.  It is probably described in the User Manual 
of the software, but I was not sufficiently motivated to chase down a 
copy.  I do have a copy of the FORTRAN code, but it was apparently 
written in FORTRAN II, and is basically inscrutable, even to an old 
FORTRAN dog like me.


So, if somebody has a clear description of the actual algorithm 
(and I think it is disturbing that I could not find one), it would be 
possible to code it up in native R.  The alternative, to write a wrapper 
for the original FORTRAN code is not a trivial task.  I gave it a couple 
of days and gave up.


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

On 04/14/2011 01:57 AM, Jari Oksanen wrote:

On 14/04/11 10:37 AM, Yong Zhang2010202...@njau.edu.cn  wrote:


Dear all,

I conducted the two-way indicator species analysis using TWINSPAN program, and
following is the final result:

  0111
  00011011
  011000111
   01001001

I have to certify my analysis, I want to classify the above 24 sampling sites
into 3 major groups based on 7 biotic metrics. The name of my 24 samples could
be site1 to site24, from the left to the right, and I set the cut levels 0, 2,
5, 10, 20,  the maximum level of divisions: 6, and maximum group size for
division:3 .

Now, my question is whether my setting is correct? And how should I classify
these sites into 3 groups accoding to this final result?

Dear Yong Zhang,

This is not an R issue, because there is no TWINSPAN in R. However, the
answer to your question is that strictly speaking you cannot group your data
into three major groups with TWINSPAN. TWINSPAN is a bisection method so
that first division gives you two groups, and second splits each of these
into two groups so that the next choice is to have four groups. However, in
this case one of the groups was so small (3 plots were split off from other
in the first division, and then these were split into groups of 2 plots and
1 plot) that you probably can ignore the second division of the small group.

If your goal was as vague as wanting to classify 24 sites into 3 major
groups you could do better than use TWINSPAN: what's the problem with proper
classification methods in R? Moreover, have you checked that your biotic
metrics suit to the pseudospecies cut level concept of TWINSPAN?

Cheers, jari oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Dissimilarity ranking

2010-12-09 Thread Dave Roberts

Burak,

I think your question is simpler than the suggestions of NMDS.  One 
approach, let's say you dissimilarity matrix is called demodis


 demodis2 - as.matrix(demodis)   # make a full matrix copy
 is.na(diag(demodis2)) - TRUE# ignore the dissimilarity of a
   # plot to itself
 apply(as.matrix(demodis),1,mean) # calcualte the mean dissimilarity
   # of every plot to the others


This is all done as part of function disana() in package labdsv along 
with some other simple dissimilarity analyses.


Dave Roberts



On 11/23/2010 01:09 PM, Pekin, Burak K wrote:

Hello, I want to rank the dissimilarity of sites based on their species 
composition. For example, I would like to be able to say that site A is less 
similar in composition to the other sites than site B is similar to the other 
sites. I could do a cluster analysis and look at which sites are less closely 
clustered.

It would be even better if I could come up with a quantitative scale rather 
than a relative ranking that would give a value for each site based on its 
relative dissimilarity to the rest of the sites. So site A might receive a 90 
out of 100, whereas site B and C might receive a 60 and a 50 indicating the 
rank as well as 'relative quantity' of dissimilarity for each site.

Thanks,
Burak

--

Burak K. Pekin, PhD
Postdoctoral Research Associate
Purdue University


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] multivariate smoothing and gradient estimation

2009-09-16 Thread Dave Roberts

Chris,

One (sub-optimal) solution would be to fit GAMS and then have the 
gam.predict estimate values immediately near your data points which you 
could use to calculate a local gradient.  If the GAM is reasonably 
smooth, I would think you could get estimates that were reasonable.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

Chris Martin wrote:

Dear list members,

I am a looking for a function that can calculate a surface from at least
four predictor variables and one response variable. I would then like to
calculate the first derivative of a specific point on this surface.

I have looked at many packages for nonparametric smoothing and kernel
density estimation but have been unable to find any that fulfill both these
criteria. For example, loess can handle multivariate data, but I do not how
to extract the derivative from the resulting fit? Many smoothing splines
offer predict functions to extract the derivative, but these functions can
only handle univariate data (e.g. smooth.spline). Ideally, I would like to
use local estimates of the surface (i.e. loess).

I would appreciate any suitable functions or advice on where to look for
functions that fulfill both these criteria.

Thank you very much for your time.

best wishes,
Chris Martin
Population Biology Graduate Group '12
University of California, Davis

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


-

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Teaching materials

2009-08-26 Thread Dave Roberts
Volker,

You're welcome to take look at my site:

http://ecology.msu.montana.edu/labdsv/R

It's primarily for multivariate analysis in community ecology, bit does
also have some stuff on GLMs and GAMS in an ecological context.  You
might also visit with Hank Stevens at Miami as he's just next door to
you.

Dave  
-  

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460 



On Wed, 2009-08-26 at 16:39 -0400, Volker Bahn wrote:
 Hi all,
 
 I'll be teaching a 400/600 level bio/ecostatistics class using R this 
 fall (using Introductory Statistics with R as text book). I was 
 wondering if anyone here had any teaching material (lecture slides, 
 exercises, homework, projects, code for teaching etc) that they would 
 like to share? Pointers to material on the web would also be welcome. I 
 found a few courses with course material on the web but they were 
 typically programming oriented and in health sciences rather than 
 ecology. If you'd rather reply off list, I'll summarize the responses 
 for the list.
 
 Thanks,
 
 Volker
 
 Biological Sciences
 Wright State University
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Clustering large data

2008-10-24 Thread Dave Roberts

Thierry and Hadley,

Sorry to be late coming into this (I forgot I subscribed to sig-eco).

package labdsv has a function called matrify() which takes a three 
column data.frame (sample,taxa,abundance) and creates a full (sparse) 
matrix representation.  I've never tried it on a data set as large as 
yours, and I'm curious if it would work.  It's pure R, but if worst 
comes to worst I used to have a FORTRAN version that would probably 
work. Please give matrify a try and let me know.


Dave R.

matrify - function (data)
{
if (ncol(data) != 3)
stop(data frame must have three column format)
plt - data[, 1]
spc - data[, 2]
abu - data[, 3]
plt.codes - levels(factor(plt))
spc.codes - levels(factor(spc))
taxa - matrix(0, nrow = length(plt.codes), ncol =
 length(spc.codes))
row - match(plt, plt.codes)
col - match(spc, spc.codes)
for (i in 1:length(abu)) {
taxa[row[i], col[i]] - abu[i]
}
taxa - data.frame(taxa)
names(taxa) - spc.codes
row.names(taxa) - plt.codes
taxa
}


hadley wickham wrote:

Hi Thierry,

Thanks for the more detailed report.  I think the new version of
reshape will help, but I just checked and it's current a total mess
and will need a lot of work before it's ready for anyone to try.
Unfortunately I'm unlikely to get to it until the ggplot2 book is
finished, so it might be a bit of a wait.

Hadley

On Tue, Oct 14, 2008 at 2:52 AM, ONKELINX, Thierry
[EMAIL PROTECTED] wrote:

Hi Hadley,

Here is a more elaborate report of what I did and what when wrong. The
example is not reproducible because the dataset is to large. A smaller
dummy dataset is not an option as it works with smaller datasets. I'm
willing to run the code again with a development version of reshape.

Cheers,

Thierry



library(RODBC)
library(reshape)

Loading required package: plyr

setwd(d:/wouter)
Sys.info()

sysname  release
  Windows XP
version nodename
build 2600, Service Pack 2 LHPA000838
machinelogin
  x86   thierry_onkelinx
   user
 thierry_onkelinx

sessionInfo()

R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du
tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats graphics  grDevices datasets  tcltk utils methods

[8] base

other attached packages:
[1] reshape_0.8.1  plyr_0.1   RODBC_1.2-3svSocket_0.9-5
svIO_0.9-5
[6] R2HTML_1.59svMisc_0.9-5   svIDE_0.9-5

loaded via a namespace (and not attached):
[1] tools_2.7.2

channel - odbcConnectAccess(db1.mdb)
km - sqlQuery(channel = channel, query = SELECT KMhokcode AS

Location, TaxonFK AS Species FROM kmhok_periode2_selectie ORDER BY
KMhokcode, TaxonFK, as.is = TRUE)

odbcCloseAll()
km$value - 1
dim(km)

[1] 1157024   3

length(unique(km$Location))

[1] 6354

length(unique(km$Species))

[1] 1381

system.time(tmp - cast(Location ~ Species, data = km[1:1000, ], fill

= 0))
  user  system elapsed
  0.110.000.17

system.time(tmp - cast(Location ~ Species, data = km[1:1, ], fill

= 0))
  user  system elapsed
   1.7 0.0 1.7

system.time(tmp - cast(Location ~ Species, data = km[1:10, ],

fill = 0))
  user  system elapsed
 46.420.45   47.02

system.time(tmp - cast(Location ~ Species, data = km, fill = 0))

Error: cannot allocate vector of size 33.5 Mb
Timing stopped at: 322.95 3.43 327.4

system.time(tmp - table(km$Location, km$Species))

  user  system elapsed
  1.100.001.11





ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
[EMAIL PROTECTED]
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens hadley wickham
Verzonden: vrijdag 10 oktober 2008 14:40
Aan: ONKELINX, Thierry
CC: r-sig-ecology@r-project.org
Onderwerp: Re: [R-sig-eco] Clustering large data


Thanks for your responses. The biggest problem seems to be cast() for
the reshape package which could not handle the dataset. Peter's

solution

using the mefa package worked fine.