Re: [R-sig-eco] predict NMS scores for new samples

2014-10-12 Thread Dave Roberts

Hi Jonathan,

   If you're using the metaMDS function in vegan with the monomds 
engine then it's possible.  I have posted a function (monomds)  at the 
bottom of


http://ecology.msu.montana.edu/labdsv/R/labs/lab9/lab9.html

that shows how to generate a labdsv:::nmds object from vegan's monomds 
function (courtesy of Peter Minchin).  You will have to have package 
vegan loaded to get the monomds FORTRAN code loaded.
Then you can use the function addpoints.nmds (also at the bottom of that 
page) to add points to an existing nmds.


   I'm not convinced that it's a good idea, but I worked with someone 
who needed it to fulfil contract obligations so I wrote it.


Dave

On 10/10/2014 11:07 AM, Jonathan Coop wrote:

hi all,

I'm wondering if anyone has developed code to predict NMS scores for new sample data, as 
the NMDS Scores Prediction algorithm functioned in PC-Ord.  I am looking at the effects 
of multiple fires (reburns) on vegetation composition.  I used an NMS ordination of 
pre-reburn vegetation samples to characterize vegetation, examine relationships between 
vegetation and subsequent fire severity, and stratify a re-sampling effort by key 
vegetation "types".  I'd love to go back now with the post-reburn vegetation 
samples and see where they fall out in the original ordination space.

Any and all thoughts welcome!

Thanks,

Jonathan Coop




[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] a possible hack for adding species scores?

2014-08-29 Thread Dave Roberts

Hi Eliot,

   With respect to leafiness you're in a common problem area.  Ordinal 
variables are hard to handle from any approach.  One alternative (only 
undertaken with a well articulated rationale) is to promote them to 
interval variables,


ordisurf(demo, as.numeric(to.ordinate$leafiness))

If you think the scale is wrong you can try other, e.g.

0,1,4,9,16,25

Otherwise you can do logistic regression on the classes

ordisurf(demo, to.ordinate$leafiness==1, family=binomial)

but you have one fewer degree of freedom than expected which can be a 
problem on a small data set.  Alternatively, you can use envfit as you 
have been doing, but that also assumes they are categorical as opposed 
to ordered.


   Maybe someone else will weigh in with a better alternative for 
ordinal variables.


Dave


On 08/29/2014 11:05 AM, Eliot Miller wrote:

Hi Dave,

Thanks a million for the response! Your solution is good in some
respects--when applied to my real data it shows me why I should be cautious
of assuming any linearity in the response of my continuous variables.
However, it doesn't work for the factors (which makes sense). You can run,

ordisurf(demo, to.ordinate$leafiness)

to see what I mean. Would it be reasonable to use some combination of
ordisurf for the continuous variables, with the locations of the factors
determined via envfit?

Thanks again for the push in the right direction!

Cheers,
Eliot


On Wed, Aug 27, 2014 at 10:24 AM, Eliot Miller  wrote:


Hi all,

A few months ago I ran into a problem that others seem to have had. In
short, if you use the FD package to calculate a Gower distance matrix which
you then either feed back into vegan or use directly in FD, you "lose the
species" scores associated with the points. You can't see what explains the
distances among points. Examples of similar problems are here:

http://r.789695.n4.nabble.com/Ordination-Plotting-Warning-Species-scores-not-available-td4664025.html
https://www.mail-archive.com/r-help@r-project.org/msg153765.html

I am less concerned with the plotting of the results than the previous
posters have been. I am more concerned with simply seeing the table of
"loadings" (don't think that term is correct for the PCoA). Perhaps this is
an acceptable and widespread solution to the problem, and I simply haven't
stumbled on it yet. It takes advantage of the vegan function envfit(). Or
perhaps it is an incorrect solution. I'm wondering which of those two
things it is, and hoping for a bit of input. In my case, I am making a
distance matrix among foraging observations spread across a number of
species. I want to know each species' niche size using the FDis metric. I
made an example dataset and completely worked through example. The main
point here is that I want to know whether the object "fitted" at the end is
correctly explaining how things are "loaded". Any help is greatly
appreciated.

Thanks in advance!
Eliot

library(FD)

#these first few lines are just building up an example dataset of foraging
observations
set.seed(0)
maneuver<-
sample(c(rep("sally",10),rep("probe",10),rep("glean",10),rep("hover",10)))

set.seed(0)
substrate<-
sample(c(rep("leaf",10),rep("flower",10),rep("air",10),rep("branch",10)))

set.seed(0)
attack.height<- rnorm(40, mean=5)

set.seed(0)
relative.height<- rnorm(40, 0.5)
relative.height[relative.height<  0]<- 0
relative.height[relative.height>  1.1]<- 1.1

set.seed(0)
leafiness<- sample(c(0:5), size=40, replace=TRUE)
#convert this to an ordinal variable
leafiness<- ordered(leafiness)

set.seed(0)
distance<- sample(c(1:4), size=40, replace=TRUE)
#convert to ordinal
distance<- ordered(distance)

#here is the complete dataset of example foraging observations
to.ordinate<- data.frame(maneuver, substrate, attack.height,
relative.height, leafiness, distance)

#now make a second table that identifies to which species each observation
belongs
road.map<- matrix(ncol=40, nrow=3, 0)

colnames(road.map)<- 1:40
row.names(road.map)<- c("species1","species2","species3")

#this for loop assigns each foraging observations to one of the species
set.seed(0)
for(i in 1:dim(road.map)[2])
{
 species<- sample(row.names(road.map), 1)
 road.map[species,i]<- 1
}

#run the FD calculations
FDresults<- dbFD(x=to.ordinate, a=road.map, corr="lingoes",
calc.FRic=FALSE, calc.FGR=FALSE, calc.CWM=FALSE, calc.FDiv=FALSE,
print.pco=TRUE)

#use the vegan function envfit() to see what is driving each axis from the
PCoA
fitted<- envfit(ord=FDresults$x.axes, env=to.ordinate, permutations=0,
choices=1:10)




[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology   

Re: [R-sig-eco] a possible hack for adding species scores?

2014-08-28 Thread Dave Roberts

Hi Elliot,

   If you copy (or just rename) your FDresults$x.axes to 
FDresults$sites you can use all the vegan ordi__ functions on your results.


E.g.

FDresults$sites <- FDresults$x.axes
demo <- ordiplot(FDresults)
ordisurf(demo,to.ordinate$attack.height)

which is a little more informative than envfit unless you're convinced 
the response should be linear.


In addition, if you want, you could add the species in

FDresults$species <- t(road.map)
demo <- ordiplot(FDresults)

to see species centroids.

Dave



On 08/27/2014 09:24 AM, Eliot Miller wrote:

Hi all,

A few months ago I ran into a problem that others seem to have had. In
short, if you use the FD package to calculate a Gower distance matrix which
you then either feed back into vegan or use directly in FD, you "lose the
species" scores associated with the points. You can't see what explains the
distances among points. Examples of similar problems are here:
http://r.789695.n4.nabble.com/Ordination-Plotting-Warning-Species-scores-not-available-td4664025.html
https://www.mail-archive.com/r-help@r-project.org/msg153765.html

I am less concerned with the plotting of the results than the previous
posters have been. I am more concerned with simply seeing the table of
"loadings" (don't think that term is correct for the PCoA). Perhaps this is
an acceptable and widespread solution to the problem, and I simply haven't
stumbled on it yet. It takes advantage of the vegan function envfit(). Or
perhaps it is an incorrect solution. I'm wondering which of those two
things it is, and hoping for a bit of input. In my case, I am making a
distance matrix among foraging observations spread across a number of
species. I want to know each species' niche size using the FDis metric. I
made an example dataset and completely worked through example. The main
point here is that I want to know whether the object "fitted" at the end is
correctly explaining how things are "loaded". Any help is greatly
appreciated.

Thanks in advance!
Eliot

library(FD)

#these first few lines are just building up an example dataset of foraging
observations
set.seed(0)
maneuver<-
sample(c(rep("sally",10),rep("probe",10),rep("glean",10),rep("hover",10)))

set.seed(0)
substrate<-
sample(c(rep("leaf",10),rep("flower",10),rep("air",10),rep("branch",10)))

set.seed(0)
attack.height<- rnorm(40, mean=5)

set.seed(0)
relative.height<- rnorm(40, 0.5)
relative.height[relative.height<  0]<- 0
relative.height[relative.height>  1.1]<- 1.1

set.seed(0)
leafiness<- sample(c(0:5), size=40, replace=TRUE)
#convert this to an ordinal variable
leafiness<- ordered(leafiness)

set.seed(0)
distance<- sample(c(1:4), size=40, replace=TRUE)
#convert to ordinal
distance<- ordered(distance)

#here is the complete dataset of example foraging observations
to.ordinate<- data.frame(maneuver, substrate, attack.height,
relative.height, leafiness, distance)

#now make a second table that identifies to which species each observation
belongs
road.map<- matrix(ncol=40, nrow=3, 0)

colnames(road.map)<- 1:40
row.names(road.map)<- c("species1","species2","species3")

#this for loop assigns each foraging observations to one of the species
set.seed(0)
for(i in 1:dim(road.map)[2])
{
 species<- sample(row.names(road.map), 1)
 road.map[species,i]<- 1
}

#run the FD calculations
FDresults<- dbFD(x=to.ordinate, a=road.map, corr="lingoes",
calc.FRic=FALSE, calc.FGR=FALSE, calc.CWM=FALSE, calc.FDiv=FALSE,
print.pco=TRUE)

#use the vegan function envfit() to see what is driving each axis from the
PCoA
fitted<- envfit(ord=FDresults$x.axes, env=to.ordinate, permutations=0,
choices=1:10)

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] fonction spc.pres in labdsv package

2013-04-26 Thread Dave Roberts

Hi Lisa,

   Sorry, I'm several days behind in monitoring the list.  I'm not sure 
I can help without a copy of your data.  Since your dataframe appears to 
have both strings ("i","p",etc) and integers I'm not sure what you're 
getting.  table(unlist(veg)) ignores NAs, although I think vegtrans 
would have warned you about NAs.


   If the data file is too large to send as an attachment (to me, not 
the list), perhaps you could show snippets.  E.g.


> veg<-read.csv(file=file.choose(),dec=",",sep=";", header=TRUE)
> any(is.na(veg))
> veg[1:10,1:10]

and then

> veg<-read.csv(file=file.choose(),dec=",",sep=";", header=TRUE)
> any(is.na(newveg))
> newveg[1:10,1:10]

and then

> spc.pres<- apply(newveg>0,2,sum)
> any(is.na(spc.pres))

Dave Roberts


On 04/19/2013 06:49 AM, lisa couet wrote:

Hi,

I have an issue concerning the fonction spc.pres in package labdsv.
the message is:

Erreur dans plot.window(...) : 'xlim' nécessite des valeurs finies
De plus : Messages d'avis :
1: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé
2: In max(x) : aucun argument pour max ; -Inf est renvoyé
3: In min(x) : aucun argument trouvé pour min ; Inf est renvoyé
4: In max(x) : aucun argument pour max ; -Inf est renvoyé

I code:

veg<-read.csv(file=file.choose(),dec=",",sep=";", header=TRUE)
attach(veg)

((to change my data into  Braun Blanquet index ))

library(labdsv)
x<-c("i","p","p1","p2","p3",1,12,13,14,15,21,22,23,24,25,31,32,33,34,35,41,42,43,44,45,51,52,53,54,55)
y<-c(0.5,3.0,3.0,3.0,3.0,15.0,15.0,15.0,15.0,15.0,37.5,37.5,37.5,37.5,37.5,62.5,62.5,62.5,62.5,62.5,85.0,85.0,85.0,85.0,85.0,97.5,97.5,97.5,97.5,97.5)
newveg<-vegtrans(veg,x,y)

(( abondance distribution))

ab<- table(unlist(newveg))

((abondance s'affiche bien)

((Number of occurrences))

spc.pres<- apply(newveg>0,2,sum)
plot(sort(spc.pres))

the error message is here.

I have many NA because it is a file where columns are species and row 
elevation. So when specie is not present, nothing is in the cell.

I hope one of you can help me,


kind regards,
lisa

[[alternative HTML version deleted]]




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Zuur / Pinierho / Faraway

2012-11-30 Thread Dave Roberts

Philip,

   IS there an online errata,or do you just have to be smart and diligent?

Thanks, Dave

On 11/29/2012 07:30 AM, Dixon, Philip M [STAT] wrote:

I agree with all the previous comments and second Tom's recommendations of Faraway 
as an 'in between' Zuur and Piniehro&  Bates.  One thing to be careful of: 
While the advice in Faraway is sound, there are more than a few mistakes in his 
equations.

Philip Dixon

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] IndVal groups & Plotting an RDA with labels

2012-07-03 Thread Dave Roberts

Caitlin,

Depending on the size of your data set one way to identify clusters 
is to label the dendrogram with cluster number.  Lets say your Ward's 
result is called 'ward.hcl', and your cluster membership (perhaps 
returned from cutree) is in a vector called 'clustid'


> plot(ward.hcl,labels=clustid)

will produce the dendrogram with cluster number identified on the 
bottom.  If the number of plots is too high they overwrite and it gets 
hard to read, but usually you can make it out if you stretch the plot.


   Alternatively, size is sometimes indicative, and you can simply

> table(clustid)

and see if cluster size is unique.

Dave Roberts

On 07/03/2012 07:12 AM, Caitlin Porter wrote:

Dear all,


I am working with a large data set to examine plant community structure on
barrens habitats and have a couple of questions regarding Indval (labdsv
package) on Wards Clusters (hclust function in stats package) and plotting
axes of an RDA result with plot, plot.cca or esqplot functions (vegan,
MASS, graphics packages)

1.  I have run IndVal (package labdsv) on a Ward’s Cluster Analysis to
determine indicator species for groups. I selected k=5 groups as my most
meaningful and conservative (sample size, etc) classification. However,
according to my average silhouette width, k=12 is the most optimal number
of groups.  I ran an IndVal on k=12 out of curiosity and notice some
interesting patterns I would like to explore further but I am having
difficulty understanding the output. The groups are labeled in the
indicator species output as “group 1,2,3,4….12”. I do not understand how to
determine which group is associated with which cluster (e.g.. in my Ward’s
dendrogram – which is group 11?). It is somewhat obvious when k is
relatively small because of the order the groups are clustered in, branch
height in the dendrogram, and species frequency and abundances, however I’d
like to know for larger groups. It would be ideal if I could even label the
dendrogram with these groups. I’ve seen examples of these in a couple of
papers with color coded boxes, but I can’t  seem to figure out how to code
it myself.

  2.  My second question relates to plotting an RDA. I have been able to
run an RDA in vegan package successfully but unable to plot it in a way
that I can interpret. I need to label sites, species (response matrix) and
environmental variables/PCA axes (explanatory matrix).

  So far, I’ve only been able to label either the response matrix or the
explanatory matrix in my graphs, but not all 3 sets of points. I’ve tried
modifying plot function and code from Borcard et al 2011, (Numerical
ecology with R), esqplot code for MASS package and plot.cca in Vegan
package.  I would prefer to use esqplot since I understand  already how to
better customize it, but I’m just looking to get any graph I can read at
this point.  When I use the plot function from Borcard et al. I see PC axes
names only. When I use esqplot, I see species names only.  I also tried
plot.cca in vegan package but wasn’t able to call up a graph. This code
looks like a great way to do it, but I’m not sure what I’m doing, *e.g*.
what to put in for const. or what the ‘unexpected symbol’ error means.

  This old thread asks a similar question (
https://stat.ethz.ch/pipermail/r-help/2009-February/188282.html), but I’m
not sure I understand its solution and have approximately 300 species so
providing a separate name for each individually might not be feasible. This
other thread asks another similar question (
http://r.789695.n4.nabble.com/RDA-Triplot-td3055474.html) but the author
finds an error is generated specifying that biplot is not an appropriate
method.

I have included my code for question 2 below.Any help would be very much
appreciated!


Sincere thanks,


Caitlin


*#esqplot (MASS package) *

library(MASS)

#subset species and sites scores from the rda for first 10 RDA axes

sr<- scores(c1.rda, display = c("sites", "species"), choices =
c(1,2,3,4,5,6), scaling = 2)

sites.only<- as.data.frame(sr$sites)

srsp<- as.data.frame(sr$species) # data frame with just the species in it

c1.site<- c1[,1] # object with just the site names from the original data
set

cp.m<- merge(c1.site, sites.only, by=0, sort=FALSE) # merged site object
with site names

eqscplot(cp.m$RDA1, cp.m$RDA2, xlim=c(-1, 1), ylim=c(-1, 1), col="blue",
xlab="RDA Axis 1", ylab="RDA Axis 2", cex=0.3) # defining the variables&
limits of plot and what the symbols look like

text(cp.m$RDA1, cp.m$RDA2, labels=cp.m$x, col="black", cex=0.3) #adding
names on plots

text(srsp$RDA1, srsp$RDA2, labels=rownames(srsp),col= "red", cex=0.3) #adding
names/species

# Error in text.default(cp.m$RDA1, cp.m$RDA2, labels = cp.m$x, col =
"black",  :   zero length 'labels'



*#Borcard et al. 2011 plot function*

plot(c1.rda, main= "Triplot RDA - scaling 2 -

Re: [R-sig-eco] Fwd: Landscape ecology in R

2012-03-13 Thread Dave Roberts

Manuel,

   Take a look at Jason Fridley's pages at

http://plantecology.syr.edu/fridley/#gsmnp

He's primarily using GRASS, but has integrated it with R in many cases.

Dave Roberts

On 03/11/2012 10:06 PM, Manuel Spínola wrote:

-- Forwarded message --
From: Manuel Spínola
Date: 2012/3/11
Subject: Re: [R-sig-eco] Landscape ecology in R
To: Sarah Goslee


Thank you Sarah and Stephen,

I am in contact with Paul Galpern who developed some tutorials on landscape
ecology in R (the ones that Stephen pointed out).  He told me that he plan
to release more in the following months. I offered him my contributions
because I have some r code to develop spatially explicit occupancy models
using landscape metrics, combining the raster, dismo and unmarked packages.
I know that Kevin McGarigal at U Mass has some R material too.

My idea with the message was to know if there is some tutorial on how to
use r because many tools on landscape ecology analysis are produced as
tools or extensions fro commercial software such as ArcView or ArcGis. Even
open source QGIS do not have a well developed landscape analysis tools for
landscape ecology, except the one developed for GRASS.

Is not a bad idea to develop R material for landscape ecology.  I know that
there are a lot of packages or tools that can do the job but I am thinking
on some good tutorials on how to combine those tools.  I am an advocate of
R and open source material and I am sure many in our list would like to use
as teaching material. I am teaching a course on wildlife habitat evaluation
(that include a lot of material on Landscape Ecology) and I am using R for
the labs.  The students really like to see all these analysis in R.

Best,

Manuel




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [R] Component analysis / cluster analysis of multiple sites based on soil characteristics

2012-01-23 Thread Dave Roberts

Sacha,

   I do not fully understand your objectives, but there are several 
things to bear in mind in your approach below.   You refer to your 
result object as water.pca, but it's simply a distance matrix, not a 
PCA.  More problematic, perhaps, is that it's calculated on a matrix 
with very different values for the columns, e.g. temp is > 30 and no2 is 
< 0.01.  In calculating Euclidean distance (the default for dist()) 
these scales matter a lot.  If it's truly a clustering of sites based on 
these attributes you want you should standardize the columns before 
calculating dist().


   Once you have a distance matrix from the standardized data you could 
use pam or agnes (as you have already done) but might also want to see 
an ordination.  Given a Euclideandistance matrix I would recommend 
Principal Coordinates Analysis (PCO or PCoA depending on source) which I 
believe is available in the ecodist package you already have loaded.


Dave Roberts

On 01/23/2012 05:51 AM, Sacha Viquerat wrote:

Hello dear list!
Maybe I am demanding too much, but I am having problems finding the
right way to tackle a seemingly trivial problem:

We counted fish at different sites. In order to assess habitat quality
at each site, we sampled temperature, pH etc. at each site, resulting in
243 observations of 8 independent variables. As we would like to
identify clusters within this data set, we stumbled upon three
approaches: two as realized in package cluster, using dist to create a
distance matrix from our numeric variables and then pam to produce a
model or agnes and then various tree methods to simplify the tree, as
well as an approach via the ecodist package (using distance and pco).
while results obtained through the cluster package were the same
(phew!), the result from the ecodist approach did not identify clusters
at all. As we are all confused and I am the one in charge of deciding
which way to go, And as I am the one most confused after all, I am
completely lost. Doing such an anlysis for the first time, I would be
satisfied wit the pam approach identifying 2 clusters (via iterating
over each k in 2:10 and picking the max average silhouette of each
model). However, as there are so many different approaches out there, I
am not sure if all the assumptions are met! It seems for example that pH
is more or less randomly distributed. Should we keep such variables? How
can I access the actual loadings of the principal axis of the pam model?
Couldn't find that anywhere! In the end, there are only 33 observations
in the 2nd group, which will be making the further analysis of fish
counts heavily unbalanced. Any suggestions?

Code snippet:

water.par

temp pH DO BOD COD no3 no2 po4
1 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20
2 33.5 7.4 5.30 4.04 15.0 0.120 0.008 0.20
.
.
.
243 29.1 7.4 6.80 12.56 45.0 0.740 0.002 0.32

water.pca<-dist(as.matrix(water.par))
k=best.k(water.pca,c(2,10),stand=T,trace=1) #finding the k with highest
average silhouette dist
clus.model<-pam(water.pca,k,stand=T)

clus.model$clusinfo

size max_diss av_diss diameter separation
1 210 30.12712 8.445552 42.29439 27.88689
2 33 12.74630 7.725452 21.91972 27.88689

water.md <- distance(water.par, "euclidean")
water.pco<-pco(water.md)
plot(water.pco$vectors[,1], water.pco$vectors[,2])

Thanks in advance and sorry for the verbosity level at max!!!

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Recommended R package for analyzing community data set? - nested design, stratified random sampling, covariates

2012-01-17 Thread Dave Roberts

Dear Laura,

   David is certainly correct that vegan would provide a wealth of 
tools for the analysis of your data.  More generally, if you go to CRAN 
and browse the "environmetrics"  Task View you will see an review of 
many packages suitable for specific analyses you may be interested in.


Dave Roberts

On 01/15/2012 12:10 PM, Laura S wrote:

Dear all:

May you recommend an R package for analyzing this data set? I would greatly 
appreciate any thoughts you can provide.


I. Study goals

This study examines soil crust (lichens and bryophytes) recovery and succession 
in fields that underwent different levels of disturbance.


II. Variables
Response variables of interest: soil crust cover (categorical scale - described 
below), species richness, species composition
Explanatory variable of interest: disturbance regime (categorical variable)
Environmental variables measured (covariates - mix of categorical and numerical 
variables): cover of mineral soil, litter, vascular plant bases, stones, or 
rocks,
slope, aspect

III. Study sampling and design
Eight research areas (BR, CB, CC, JL, PC, PL, SL, TR)
Within each research area subplots were assigned six disturbance treatments 
(NC, NS, OC, OS, SC, SS) based on disturbance history
A single transect was placed randomly in the center of each subplot and sampled 
in twenty 20 x 20 cm plots at 1 m intervals along the transect
47 of 48 possible treatment subplots were sampled (n=6 for 7 sites, n=5 for 1 
site)


Sampling cover scale:

Scale valueRepresentative % cover

1<= 1
2>1-4
3>4-10
4>10-25
5>25-50
6>50-75
7>75-95
8>95-100


There were three different sampling times (spread over two years), but time of 
sampling was not considered as a confounding factor given the way sampling was 
conducted with the particular communities studied (soil crust communities).

Total species positively identified: 33 taxa (species and species groups), 15 
of these were in four or more
of 47 subplots (n=6 x 7, n=5 x 1)
Unidentified collected taxa: less than 0.5%
Approximate taxa pool (species observed in entire areas, but not necessarily in 
sample plots): 26 lichen + 21 bryophytes
= 47 taxa

Thank you for your time and consideration,
Laura
[[alternative HTML version deleted]]




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The problems on BIO-ENV procedure by [R]

2011-12-05 Thread Dave Roberts

Shun Tsuobi,

   isoMDS (and routines based on it) will not allow zero dissimilarity, 
which implies perfect replicates.  One alternative is to remove one of 
the replicates, but that may have unsatisfactory effect on further 
analyses.  Alternatively, if you are sure you want to keep the duplicate 
plots you can change the zero to a small value


d[d==0] <- 0.0001

and run the isoMDS on the resulting revised dissimilarity matrix.

Undoubtedly someone from the vegan group will respond to your second 
question.


Dave Roberts

On 12/05/2011 01:28 AM, 坪井 隼 wrote:

Dear Madam / Sir,

I have two questions for the use of the “R” program for the ecological
research. I am studying the relationship between the community
structures of environmental microbes and some environmental conditions.
For this objective, I have known that the BIOENV procedure, which was
developed by Clarke&  Ainsworth (1993), is available on the “R” software.


Fist question;
I attempted the use of the procedure to analyze the relationship between
the variation of the microbial community structures and the
environmental factors. However, I can not analyze the relationship based
on isoMDS function. The isoMDS was inacceptable for my dataset. The
command for the BIOENV procedure, which I programmed, and the error
massage I gained was as follow;


library(MASS)
library(vegan)
communitydat<-read.table("C:/Documents and Settings/shuntsuboi/desktop

/bray.txt", head- er=T)

environdat<-read.csv("C:/Documents and Settings shuntsuboi/desktop/ev.

csv",header=T)

env<-environdat[,c("variablesA","variablesB.","variablesC","variablesD

","variablesE")]

d<- vegdist(communitydat, "bray")
isoMDS(d)

error   isoMDS(d) : zero or negative distance between objects 1 and 2

As mentioned above, I can not run the program because the error, which
is “isoMDS(d) : zero or negative distance between objects 1 and 2”,
occurred. What are the ways to solve this problem ? On isoMDS function,
what are the ways that the zero distance of “Bray-Curtis distance” is
acceptable in the function ?

Second question;
Based on the command as above, I ran the metaMDS function. However,
although I could automatically describe the two dimensional ordination
plot figure, I could not gain the X and Y value of the respective plots.
Then, the error massage was shown as follow;
“In ordiplot(x, choices = choices, type = type, display = display,  :
Species scores not available”

What are the ways to solve this problem ?

Sincerely yours,
Shun Tsuboi

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] post hoc in Kruskal Wallis

2011-12-01 Thread Dave Roberts

If the design is balanced this function will work

npcompare <- function (var,class)
{
k <- length(table(class))
n <- length(var)/k

se <- sqrt((n*(n*k)*(n*k+1))/12)
rvar <- rank(var)

for (i in 1:(k-1)) {
left <- sum(rvar[as.numeric(class)==i])
for (j in (i+1):k) {
right <- sum(rvar[as.numeric(class)==j])
diff = abs(left-right)
cat(paste(i,j,left,right,diff,round(1-pt(diff/se,k),4),'\n'))
}
}
cat('\n')
}

It's not formatted very elegantly, but it works.

Dave R.

On 11/24/2011 08:01 AM, Richard L. Boyce wrote:

Zar, in Biostatistical Analysis, has a Tukey-like test that is highly
recommended. You may need to do it by hand, but it is quite easy.


Message: 5
Date: Wed, 23 Nov 2011 18:21:54 +0100
From: "Jakub Szymkowiak"
To:
Subject: [R-sig-eco]  post hoc in Kruskal Wallis
Message-ID:
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original

Hi,
does anyone know, how can I perform post-hoc tests (especially Least
Significant Difference and Sheffe Test) for results from Kruskal-Wallis
test? In KruskaI-Wallis test I found some significant differences between
tested groups, but I want to know between which groups this difference is
really signifficant.

Cheers,
Jakub




--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Sources for learning mapping/GIS with R?

2011-10-25 Thread Dave Roberts
R and GRASS can both be made to talk to the same database (I use 
PostgreSQL) at the same time, allowing you you to use the best tool for 
each job and keep everything in sync.


You might also want to look at

http://pvanb.wordpress.com/

and Dylan Beaudette at

http://casoilresource.lawr.ucdavis.edu/drupal/node/905

Dave Roberts

On 10/25/2011 11:58 AM, Marcelino de la Cruz wrote:


Also, for open source GIS ( GRASS and R)

Neteler, M., Mitasova, H., 2008. Open Source GIS: A GRASS GIS
Approach, 3rd Edt. Springer, The International Series in Engineering
and Computer Science: Volume 773. 406 p.

By the way, you can find a lot of useful resources in the home page
of Tomislav Hengl:
http://spatial-analyst.net/wiki/index.php?title=Main_Page

Cheers,
Marcelino


At 19:47 25/10/2011, Sarah Goslee wrote:

There's a spatial taskview that's the perfect place to start:
installing the taskview will
give you very nearly all the related packages.

http://cran.r-project.org/web/views/Spatial.html

For learning how to use them, Bivand et al's Spatial Analysis book is good:
http://books.google.com/books/about/Applied_spatial_data_analysis_with_R.html?id=4gg8yx_lcj0C

Sarah

On Tue, Oct 25, 2011 at 1:11 PM, Raphael Mazor  wrote:

Can anyone recommend some basic resources for learning how to create maps
with R?
Or recommend packages that are particularly helpful?

Thanks!--





--
Sarah Goslee
http://www.functionaldiversity.org

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




Marcelino de la Cruz Rot
Depto. Biologia Vegetal
EUIT Agricola
Universidad Politecnica de Madrid

tel: 34 + 913365435

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Dave Roberts



On 10/10/2011 02:15 PM, Gavin Simpson wrote:

On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote:

On Mon, 10 Oct 2011, Dave Roberts wrote:


I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques.



   I would try something pretty direct. Any appeal to differences in
dissimilarities confounds the effects with the particular
dissimilarity/distance matrix you use. Assuming the samples and species
are in the same order, and that the data.frames are the same size, you
might try


I did not read the original message, so I hope you'll allow me to join the
thread. My recommendation is to use univariate tree models, particularly a
classification tree (for ordinal explanatory variables; i.e., ST1 and ST2).


But the response here is *multivariate* - of course, one could use Glen
De'Ath's multivariate regression trees (despite the name it is really a
constrained clustering/classification) - but I think there are better
ways of solving this particular problem. And unless one has many 100s of
observations, the model will need some sort of variance reduction
applied (via bagging, or some such) as the one fitted model is
potentially highly unstable.

G


This is fully, carefully, and non-technically explained in Chapter 9
(particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith "Analysing
Ecological Data." For that matter, I highly recommend reading the whole
book.

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




It would be fairly simple to boil down to a univariate question.  You 
could do something as simple as a paired t-test of plot-level species 
richness or the number of individuals sampled (to compare sampling 
efficiency), but I still don't see an independent and a dependent variable.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Dave Roberts



On 10/07/2011 08:51 AM, Dr N.A. Cutler wrote:

Dear All,

I have a query about multivariate analysis of community data.

In my experiment, 24 microbial communities in different locations were
sampled using Sampling Technique 1 (ST1). A site X species matrix was
then derived by molecular analysis.

The same 24 locations were then sampled again using a different sampling
technique (ST2) and a second site X species matrix was derived. It is
assumed that community structure remains intact after sampling by
Technique 1 i.e. the two techniques can sample from the same pool of
organisms.

I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques. My research
question is: does Technique 1 produce a similar signal to Technique 2?
Or do the different techniques give significantly different pictures of
community structure? The null hypotheses is that there is no significant
difference between the two sampling techniques i.e. they both capture
community structure with the same degree of accuracy.

It occurred to be that I could use a multivariate ANOVA technique (e.g.
Adonis) to distinguish between the results of the two sampling
exercises, using sampling technique as a factor. But I am not sure how
to deal with the obvious correlation between sample pairs. Should this
situation be addressed as a repeated measures experiment with two time
steps? If so, what is the best technique to use (a mixed model, perhaps?)

Any advice would be gratefully received.

Best wishes,

Nick Cutler

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Nick,

   I would try something pretty direct.  Any appeal to differences in 
dissimilarities confounds the effects with the particular 
dissimilarity/distance matrix you use.  Assuming the samples and species 
are in the same order, and that the data.frames are the same size, you 
might try


> actual <- sum((ST1-ST2)^2)

and then permute one of the two matrices numerous times

res <- rep(NA,999)
for (i in 1:999) {
 res[i] <- sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2)
}
final <- (sum(res <= actual) + 1)/1000

and see what fraction of the permuted matrices are as similar.

Hopefully Gavin will weigh in with a better randomization.

   If you do go with a multivariate approach I might try a procrustes 
analysis of PCO ordinations.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] help with analysis

2011-08-19 Thread Dave Roberts

Hi Kátia,

   Sorry to be so late; just back in the office.

   Pedro is correct that adonis will help establish statistical 
significance to potential differences, but I think NMDS could still be 
very helpful.  One approach would be to code the sites by glyph (e.g. 
site 1 = circle, site 2 = triangle, etc.) and then to draw arrows from 
the first date to the second, second to third, etc, for each site.


   In labdsv you could do this if your nmds is called nmds.object

> plot(nmds.object,type=n')

to draw the axes but not the points, and then use

> points(nmds.object,site==1,pch=1)
> points(nmds.object,site==2,pch=2)

etc.  Then, depending on how the sites are sorted, the arrows could be drawn

> 
arrows(nmds.object$points[1,1],nmds.object$points[1,2],nmds.object$points[2,1],

  nmds.object$points[2,2])

to draw an arrow form the first point to the second.  YOu might need to 
mess with the parameters of arrows() to get the arrow sizes you want.


   In vegan you could do

> nmds.plot <- plot(metaMDS(dissimilarity or taxon matrix),type='n')
> points(nmds.plot$sites[site==1])
> points(nmds.plot$sites[site==2],pch=2)
etc
> arrows(nmds.plot$sites[1,1],nmds.plot$sites[1,2],
   nmds.plot$sites[2,1],nmds.plot$sites[2,2])

etc.

Hope that helps, Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

On 08/08/2011 08:20 AM, Kátia Emidio wrote:

Dear all,
I have a data from inventory of plants in 6 fragmented forests of the same
size,randomly selected, measured during 3 periods of time, and I'd like to
know about changes in species composition over time. It is a good idea to
use NMDS for each period of time
and then to compare the changes in the ordination diagram?? What kind of
analysis could be better??
Cheers,
Katia

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] controlling the species displayed on a biplot

2011-05-25 Thread Dave Roberts
Thanks Gavin.  That's actually very nice and I suspect just what Andy 
was looking for.


Dave

On 05/23/2011 04:33 AM, Gavin Simpson wrote:

require(vegan)
require(labdsv)
data(dune)
data(dune.env)

## Indicators of Management type
set.seed(38)
inds<- with(dune.env, indval(dune, Management))

## ordinate
dune.pca<- rda(dune, scale = TRUE)

## exract scores you want to plot presume species and sites
dune.site<- scores(dune.pca, display = "sites", scaling = 3)
dune.spp<- scores(dune.pca, display = "species",
 scaling = 3)[inds$pval<= 0.05, ]

## plot
plot(dune.pca, display = c("sites","species"), type = "n",
  scaling = 3)
points(dune.site)
arrows(0, 0, dune.spp[,1], dune.spp[,2], col = "red", length = 0.05)
lab.xloc<- dune.spp[,1] * 1.2
lab.yloc<- dune.spp[,2] * 1.2
text(lab.xloc, lab.yloc, rownames(dune.spp), col = "red", cex = 0.8)



--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] controlling the species displayed on a biplot

2011-05-21 Thread Dave Roberts

On 05/20/2011 11:54 PM, Andrew Halford wrote:

Hi All,

I have done a IndVal analysis on a dataset which has identified 27 species
out of 57 as being significant arbiters of site differences. What I want to
do is a PCA biplot (data is hellinger transformed prior to this) with only
these 27 species on the ordination, not all 57, How can I control the
species I want displayed?

cheers

Andy


--
Andrew,

   We need a little more information.  Which IndVal function did you 
use?  It should return a vector of p-values you can use as a mask [p-val 
<= 0.05,] on your taxon matrix.  But are you saying you want to use all 
57 in constructing the PCA and only identify 27, or that you only want 
to use those 27 in the construction of the PCA?


Dave Roberts

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-27 Thread Dave Roberts

Thanks Jari,

My original thought was to write a wrapper for the original FORTRAN 
code by replacing the file read of data with data passed from R, and 
then bringing in the results in a list.  That would allow sizing the 
arrays at run-time and eliminating fixed array sizes.  I have a copy of 
the FORTRAN code that Petr Smilauer modified for simplified 
input/outputand that helped.  Still, it ultimately appeared pretty messy 
(but still might be the best route), so I tried separating out the 
subroutines and calling them individually from R.  From there I tried 
replacing some of the subroutines with native R to lower overhead.  But 
in the end I just couldn't understand the code well enough to make it work.


So then I thought I should write write a totally transparent 
version in native R, even if it doesn't replicate the original.  On the 
down side people can say it's not correct; on the upside it's open 
source and people can evaluate it and modify it as they see fit.  So, if 
there is interest I might post the code and examples on my web page and 
let somebody else have it to run with.


Dave

Jari Oksanen wrote:

On 27/04/11 00:40 AM, "Dave Roberts"  wrote:

 Earlier this year on an (undoubtedly ill-advised) lark I coded up
an R version of TWINSPAN.  It's far from a polished package at this
point, but the code does run.  One of the interesting features is that
you can partition a PCO or NMDS in addition to the traditional CA. To be
clear, I am not a TWINSPAN fan either, but I wanted it for a methods
paper I was working on.

 The problem is that I based the code on Hill, Bunch & Shaw (1975,
J of  Ecol  63:597-613) which is what I had available.  Apparently the
algorithm in the commercial TWINSPAN is significantly modified from the
original, but I couldn't find a description of the actual algorithm
anywhere in the literature.  It is probably described in the User Manual
of the software, but I was not sufficiently motivated to chase down a
copy.  I do have a copy of the FORTRAN code, but it was apparently
written in FORTRAN II, and is basically inscrutable, even to an old
FORTRAN dog like me.

 So, if somebody has a clear description of the actual algorithm
(and I think it is disturbing that I could not find one), it would be
possible to code it up in native R.  The alternative, to write a wrapper
for the original FORTRAN code is not a trivial task.  I gave it a couple
of days and gave up.


Dave,

Hill, Bunch & Shaw describe the general idea of TWINSPAN, but the
implementation is more complicated. Martin Kent and Paddy Coker do a great
job of explaining the twists in their book ("vegetation description and
analysis: a practical approach"). If I remember correctly, the TWINSPAN
manual also was more detailed, but I lost it somewhere when I moved around
(for the kids: it was a bunch of paper: pdf was not yet invented when
TWINSPAN was published).

I don't think that the actual TWINSPAN is easily extended beyond CA. Each
step is a two-stage one-dimensional ordination on a current subset, where
the first stage selects indicators and the second stage is polarized for the
indicator species. The final split is based on site ordination and
indicators are secondary (which we see in misclassifications if you try to
use the provided key for the data that was classified in TWINSPAN). The
polarization stage is particularly challenging when working with
dissimilarities (PCO, NMDS).

I don't think that the FORTRAN I have is completely impenetrable. I think
the largest problem is the design principle: R code should run silently and
return a result, but TWINSPAN prints when it goes on and returns only a part
of the result. Incorporating that in R would need stripping most PRINT and
WRITE and have subroutines to return useful data directly.

I also wrote a small funny test on TWINSPAN principle, where the splitting
and pre-defined pseudospecies where replaced with regression tree split.
I'll send you a copy of that and the FORTRAN (IV, I think) code I have in a
separate message.

Cheers, Jari Oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-27 Thread Dave Roberts

Dear Zoltan,

   Thanks for the note.  The R function I wrote does in fact follow the 
Roleček et al protocol, and that's partly what motivated the idea to 
write it up. Lubomír Tichý, Petr Smilauer, and Laco Mucina have all 
contributed information in the development, but I've still been stymied 
by the lack of solid information on the actual algorithm.


   I think it is quite possible to write a function that operates on 
the principle of TWINSPAN, following Roleček et al, but writing a 
function that exactly matches the output from the commercial package may 
prove to be too much trouble.


Thanks, Dave

Zoltan Botta-Dukat wrote:

Dear Dave,

This modified version of TWINSPAN may be interesting for you when you 
compare methods:


Modified TWINSPAN classification in which the hierarchy respects cluster 
heterogeneity


Jan Roleček, Lubomír Tichý, David Zelený, Milan Chytrý 2009 Modified 
TWINSPAN classification in which the hierarchy respects cluster 
heterogeneity Journal of Vegetation Science 20(4): 596–602
http://onlinelibrary.wiley.com/doi/10./j.1654-1103.2009.01062.x/abstract 



Zoltan

2011.04.26. 23:40 keltezéssel, Dave Roberts írta:

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up an R 
version of TWINSPAN. It's far from a polished package at this point, 
but the code does run. One of the interesting features is that you can 
partition a PCO or NMDS in addition to the traditional CA. To be 
clear, I am not a TWINSPAN fan either, but I wanted it for a methods 
paper I was working on.


The problem is that I based the code on Hill, Bunch & Shaw (1975,
J of Ecol 63:597-613) which is what I had available. Apparently the 
algorithm in the commercial TWINSPAN is significantly modified from 
the original, but I couldn't find a description of the actual 
algorithm anywhere in the literature. It is probably described in the 
User Manual of the software, but I was not sufficiently motivated to 
chase down a copy. I do have a copy of the FORTRAN code, but it was 
apparently written in FORTRAN II, and is basically inscrutable, even 
to an old FORTRAN dog like me.


So, if somebody has a clear description of the actual algorithm (and I 
think it is disturbing that I could not find one), it would be 
possible to code it up in native R. The alternative, to write a 
wrapper for the original FORTRAN code is not a trivial task. I gave it 
a couple of days and gave up.




___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] The final result of TWINSPAN

2011-04-26 Thread Dave Roberts

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up 
an R version of TWINSPAN.  It's far from a polished package at this 
point, but the code does run.  One of the interesting features is that 
you can partition a PCO or NMDS in addition to the traditional CA. To be 
clear, I am not a TWINSPAN fan either, but I wanted it for a methods 
paper I was working on.


The problem is that I based the code on Hill, Bunch & Shaw (1975,
J of  Ecol  63:597-613) which is what I had available.  Apparently the 
algorithm in the commercial TWINSPAN is significantly modified from the 
original, but I couldn't find a description of the actual algorithm 
anywhere in the literature.  It is probably described in the User Manual 
of the software, but I was not sufficiently motivated to chase down a 
copy.  I do have a copy of the FORTRAN code, but it was apparently 
written in FORTRAN II, and is basically inscrutable, even to an old 
FORTRAN dog like me.


So, if somebody has a clear description of the actual algorithm 
(and I think it is disturbing that I could not find one), it would be 
possible to code it up in native R.  The alternative, to write a wrapper 
for the original FORTRAN code is not a trivial task.  I gave it a couple 
of days and gave up.


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

On 04/14/2011 01:57 AM, Jari Oksanen wrote:

On 14/04/11 10:37 AM, "Yong Zhang"<2010202...@njau.edu.cn>  wrote:


Dear all,

I conducted the two-way indicator species analysis using TWINSPAN program, and
following is the final result:

  0111
  00011011
  011000111
   01001001

I have to certify my analysis, I want to classify the above 24 sampling sites
into 3 major groups based on 7 biotic metrics. The name of my 24 samples could
be site1 to site24, from the left to the right, and I set the cut levels 0, 2,
5, 10, 20,  the maximum level of divisions: 6, and maximum group size for
division:3 .

Now, my question is whether my setting is correct? And how should I classify
these sites into 3 groups accoding to this final result?

Dear Yong Zhang,

This is not an R issue, because there is no TWINSPAN in R. However, the
answer to your question is that strictly speaking you cannot group your data
into three major groups with TWINSPAN. TWINSPAN is a bisection method so
that first division gives you two groups, and second splits each of these
into two groups so that the next choice is to have four groups. However, in
this case one of the groups was so small (3 plots were split off from other
in the first division, and then these were split into groups of 2 plots and
1 plot) that you probably can ignore the second division of the small group.

If your goal was as vague as wanting to classify 24 sites into 3 major
groups you could do better than use TWINSPAN: what's the problem with proper
classification methods in R? Moreover, have you checked that your "biotic
metrics" suit to the pseudospecies cut level concept of TWINSPAN?

Cheers, jari oksanen

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Dissimilarity ranking

2010-12-09 Thread Dave Roberts
Well, it was not only a little sloppy, but a little late.  Reading 
further I saw others had made the same suggestion a while ago. 
Nonetheless, that will remind me to paste my emails into R and check.


Thanks François

On 12/09/2010 03:34 PM, francois Guilhaumon wrote:

Dear all,

I would like to point out three small mistakes in Dave's code, which of
course are certainly typos.

If one wants to not get only NAs in the result vector, one should
specify na.rm=TRUE in the apply statement, furthermore I guess that
the mean function has to be applied to 'demodis2' and not 'dimodis'
which is an object of class 'dist', finally there is no need to
transform 'demodis' again using the as.matrix function in the apply
statement:

demodis2<- as.matrix(demodis)   # make a full matrix copy
is.na(diag(demodis2))<- TRUE# ignore the dissimilarity of a
   # plot to itself
apply(demodis2,1,mean,na.rm=TRUE) # calculate the mean dissimilarity
   # of every plot to the others

I checked the code, it's now working.

Cheers,

François.

2010/12/9 Dave Roberts:

Burak,

I think your question is simpler than the suggestions of NMDS.  One
approach, let's say you dissimilarity matrix is called demodis


demodis2<- as.matrix(demodis)   # make a full matrix copy
is.na(diag(demodis2))<- TRUE# ignore the dissimilarity of a

   # plot to itself

apply(as.matrix(demodis),1,mean) # calcualte the mean dissimilarity

   # of every plot to the others


This is all done as part of function disana() in package labdsv along with
some other simple dissimilarity analyses.

Dave Roberts



On 11/23/2010 01:09 PM, Pekin, Burak K wrote:


Hello, I want to rank the dissimilarity of sites based on their species
composition. For example, I would like to be able to say that site A is less
similar in composition to the other sites than site B is similar to the
other sites. I could do a cluster analysis and look at which sites are less
closely clustered.

It would be even better if I could come up with a quantitative scale
rather than a relative ranking that would give a value for each site based
on its relative dissimilarity to the rest of the sites. So site A might
receive a 90 out of 100, whereas site B and C might receive a 60 and a 50
indicating the rank as well as 'relative quantity' of dissimilarity for each
site.

Thanks,
Burak

--

Burak K. Pekin, PhD
Postdoctoral Research Associate
Purdue University


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Dissimilarity ranking

2010-12-09 Thread Dave Roberts

Burak,

I think your question is simpler than the suggestions of NMDS.  One 
approach, let's say you dissimilarity matrix is called demodis


> demodis2 <- as.matrix(demodis)   # make a full matrix copy
> is.na(diag(demodis2)) <- TRUE# ignore the dissimilarity of a
   # plot to itself
> apply(as.matrix(demodis),1,mean) # calcualte the mean dissimilarity
   # of every plot to the others


This is all done as part of function disana() in package labdsv along 
with some other simple dissimilarity analyses.


Dave Roberts



On 11/23/2010 01:09 PM, Pekin, Burak K wrote:

Hello, I want to rank the dissimilarity of sites based on their species 
composition. For example, I would like to be able to say that site A is less 
similar in composition to the other sites than site B is similar to the other 
sites. I could do a cluster analysis and look at which sites are less closely 
clustered.

It would be even better if I could come up with a quantitative scale rather 
than a relative ranking that would give a value for each site based on its 
relative dissimilarity to the rest of the sites. So site A might receive a 90 
out of 100, whereas site B and C might receive a 60 and a 50 indicating the 
rank as well as 'relative quantity' of dissimilarity for each site.

Thanks,
Burak

--

Burak K. Pekin, PhD
Postdoctoral Research Associate
Purdue University


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] some MDS and R problems

2010-06-16 Thread Dave Roberts

Arnaldo,

Given that NMDS operates essentially on rank distances, small 
variations in dissimilarity generally have low impact.  I rarely have 
points with zero dissimilarity on quantitative indices, but it sometimes 
happens on presence/absence based indices.  If you have a dissimilarity 
object named dis, I just do


> dis[dis==0] <- 0.0001
> res <- nmds(dis)

Following Jari's argument about the other point's contribution to 
stress, I think fudging one dissimilarity is a better solution than 
dropping out a point


Dave Roberts

Arnaldo Russo wrote:

Hi all,

I'm executing some multivariate exploratory annalysis, with vegan and MASS
packages.

Some problems are not solved for me at this moment. I have read all past
discussions over MDS problems.

In my studies occurs a situation that some samples are equal each other, and
this results in zero dissimilarities.

I tryed with no success, use the stepacross function. So, in this case I
changed one of these zero dissimilarities values for a minimum (1e-5). Some
of these modifications are different than "stepacross" or same isoMDS(x.dist
+ 1e-5)  (proposed by Jari oksanen to 'lie' to isoMDS? When I delete a
sample I do not loose some weight?

Changing one of those zero dissimilarities and using zerodist = "add" option
of metaMDS, it passed the current error (zero or negative distance between
objects).

Someone could explain the function posted by Gavin Simpson (Jan 12, 2010;
05:45pm Re: Non-metric multidimensional scaling (NMDS) help). I didn't get
some result.

If there is a good reason, and you want to include all samples, then
you'll need to come up with a means for handling them. metaMDSdist allow
you to add a small value to the zero dissimilarities. The details are in

the code, but effectively all zero distances are replaced by half the
smallest non zero distance. You could do a similar replacement yourself
if you feel this is warranted and/or justified.

minDij <- min(Dij[Dij > 0) / 2
Dij[Dij <= 0] <- minDij

Will do this replacement if Dij is your matrix (replace Dij with
whatever the name of your matrix is). Then supply the new matrix to
metaMDS. "




When I use unique function as #x.dist <- dist(unique(X))
this cutted some of my samples that I do not know. I can't see my variables
in the plot. Some help?

Cheers, Arnaldo.


epi<- read.table("epi.txt", h=TRUE,row.names=1)

library('vegan')
library('MASS')

epi.<- as.matrix(epi)

dissim1<- vegdist(epi., method="jaccard", binary=TRUE)
dissim1.mds<-metaMDS(dissim1,k=2)

plot(dissim1.mds,type="n")
text(dissim1.mds, labels=as.character(row.names(epi)))

#epi
samplesp1sp2sp3sp4sp5sp6sp7sp8sp9
sp10sp11sp12sp13sp14sp15sp16sp17sp18sp19
1E10100000000000
00000
2E00101000000000
00000
3E10100000000000
00000
4E10101000000000
00000
5E10000000000000
00000
6E10000000000000
00000
7M10100010000000
00000
8E10000010000000
00000
11E10001010000000
00000
12E00000100000000
00000
13E00000010001100
00000
14E00100000000100
00000
15E10000010000010
00000
17E00001010100000
00000
18E00000010000000
00000
20E10001000000000
00000
21E10000000000001
00000
22E10001000100000
00000
23E10111000000000
00000
27E10011000000000
00000
28E10001011001000
00000
30E11001000000000
0 

Re: [R-sig-eco] multivariate smoothing and gradient estimation

2009-09-16 Thread Dave Roberts

Chris,

One (sub-optimal) solution would be to fit GAMS and then have the 
gam.predict estimate values immediately near your data points which you 
could use to calculate a local gradient.  If the GAM is reasonably 
smooth, I would think you could get estimates that were reasonable.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

Chris Martin wrote:

Dear list members,

I am a looking for a function that can calculate a "surface" from at least
four predictor variables and one response variable. I would then like to
calculate the first derivative of a specific point on this surface.

I have looked at many packages for nonparametric smoothing and kernel
density estimation but have been unable to find any that fulfill both these
criteria. For example, loess can handle multivariate data, but I do not how
to extract the derivative from the resulting fit? Many smoothing splines
offer predict functions to extract the derivative, but these functions can
only handle univariate data (e.g. smooth.spline). Ideally, I would like to
use local estimates of the surface (i.e. loess).

I would appreciate any suitable functions or advice on where to look for
functions that fulfill both these criteria.

Thank you very much for your time.

best wishes,
Chris Martin
Population Biology Graduate Group '12
University of California, Davis

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


-

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Teaching materials

2009-08-26 Thread Dave Roberts
Volker,

You're welcome to take look at my site:

http://ecology.msu.montana.edu/labdsv/R

It's primarily for multivariate analysis in community ecology, bit does
also have some stuff on GLMs and GAMS in an ecological context.  You
might also visit with Hank Stevens at Miami as he's just next door to
you.

Dave  
-  

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460 



On Wed, 2009-08-26 at 16:39 -0400, Volker Bahn wrote:
> Hi all,
> 
> I'll be teaching a 400/600 level bio/ecostatistics class using R this 
> fall (using "Introductory Statistics with R" as text book). I was 
> wondering if anyone here had any teaching material (lecture slides, 
> exercises, homework, projects, code for teaching etc) that they would 
> like to share? Pointers to material on the web would also be welcome. I 
> found a few courses with course material on the web but they were 
> typically programming oriented and in health sciences rather than 
> ecology. If you'd rather reply off list, I'll summarize the responses 
> for the list.
> 
> Thanks,
> 
> Volker
> 
> Biological Sciences
> Wright State University
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Fuzzy-Set-Ordination for classifying plant species???

2009-08-24 Thread Dave Roberts
Dragos,

I'm sorry but I do not understand your question.  Are you creating a
dissimilarity matrix of lakes or species?  I suspect that a transposing
of your matrix might work, but I need to understand your problem better.
Please send more detail

Thanks, Dave

On Tue, 2009-08-18 at 02:55 -0700, Dragos Zaharescu wrote:
> 
> Hello,
> 
> I have a matrix with 180 plant species (variables, binary) and 270 rows 
> (altitude waterbodies). There is also one categorical variable (4 categories 
> representing similar lakes groups, which resulted from a prior analysis of 
> the transposed matrix).
> Is there any way to load plant species on those categories in 
> fuzzy-set-ordination analysis? Is there anyone here that has faced this 
> question before or could provide a hint? All my efforts so farhave lead to 
> loading the lakes on the categories (fso in R); but I want to load the plant 
> species.
> 
> I would greatly appreciate your help.
> 
> Dragos Zaharescu
> Vigo University
> 
> ~ You should be the change you want to see in the world~ Ghandi
> 
> 
>   
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
-- 

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Clustering large data

2008-10-24 Thread Dave Roberts

Thierry and Hadley,

Sorry to be late coming into this (I forgot I subscribed to sig-eco).

package labdsv has a function called matrify() which takes a three 
column data.frame (sample,taxa,abundance) and creates a full (sparse) 
matrix representation.  I've never tried it on a data set as large as 
yours, and I'm curious if it would work.  It's pure R, but if worst 
comes to worst I used to have a FORTRAN version that would probably 
work. Please give matrify a try and let me know.


Dave R.

matrify <- function (data)
{
if (ncol(data) != 3)
stop("data frame must have three column format")
plt <- data[, 1]
spc <- data[, 2]
abu <- data[, 3]
plt.codes <- levels(factor(plt))
spc.codes <- levels(factor(spc))
taxa <- matrix(0, nrow = length(plt.codes), ncol =
 length(spc.codes))
row <- match(plt, plt.codes)
col <- match(spc, spc.codes)
for (i in 1:length(abu)) {
taxa[row[i], col[i]] <- abu[i]
}
taxa <- data.frame(taxa)
names(taxa) <- spc.codes
row.names(taxa) <- plt.codes
taxa
}


hadley wickham wrote:

Hi Thierry,

Thanks for the more detailed report.  I think the new version of
reshape will help, but I just checked and it's current a total mess
and will need a lot of work before it's ready for anyone to try.
Unfortunately I'm unlikely to get to it until the ggplot2 book is
finished, so it might be a bit of a wait.

Hadley

On Tue, Oct 14, 2008 at 2:52 AM, ONKELINX, Thierry
<[EMAIL PROTECTED]> wrote:

Hi Hadley,

Here is a more elaborate report of what I did and what when wrong. The
example is not reproducible because the dataset is to large. A smaller
dummy dataset is not an option as it works with smaller datasets. I'm
willing to run the code again with a development version of reshape.

Cheers,

Thierry



library(RODBC)
library(reshape)

Loading required package: plyr

setwd("d:/wouter")
Sys.info()

sysname  release
  "Windows" "XP"
version nodename
"build 2600, Service Pack 2" "LHPA000838"
machinelogin
  "x86"   "thierry_onkelinx"
   user
 "thierry_onkelinx"

sessionInfo()

R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du
tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats graphics  grDevices datasets  tcltk utils methods

[8] base

other attached packages:
[1] reshape_0.8.1  plyr_0.1   RODBC_1.2-3svSocket_0.9-5
svIO_0.9-5
[6] R2HTML_1.59svMisc_0.9-5   svIDE_0.9-5

loaded via a namespace (and not attached):
[1] tools_2.7.2

channel <- odbcConnectAccess("db1.mdb")
km <- sqlQuery(channel = channel, query = "SELECT KMhokcode AS

Location, TaxonFK AS Species FROM kmhok_periode2_selectie ORDER BY
KMhokcode, TaxonFK", as.is = TRUE)

odbcCloseAll()
km$value <- 1
dim(km)

[1] 1157024   3

length(unique(km$Location))

[1] 6354

length(unique(km$Species))

[1] 1381

system.time(tmp <- cast(Location ~ Species, data = km[1:1000, ], fill

= 0))
  user  system elapsed
  0.110.000.17

system.time(tmp <- cast(Location ~ Species, data = km[1:1, ], fill

= 0))
  user  system elapsed
   1.7 0.0 1.7

system.time(tmp <- cast(Location ~ Species, data = km[1:10, ],

fill = 0))
  user  system elapsed
 46.420.45   47.02

system.time(tmp <- cast(Location ~ Species, data = km, fill = 0))

Error: cannot allocate vector of size 33.5 Mb
Timing stopped at: 322.95 3.43 327.4

system.time(tmp <- table(km$Location, km$Species))

  user  system elapsed
  1.100.001.11





ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
[EMAIL PROTECTED]
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens hadley wickham
Verzonden: vrijdag 10 oktober 2008 14:40
Aan: ONKELINX, Thierry
CC: r-sig-ecology@r-project.org
Onderwerp: Re: [R-sig-eco] Clustering large data


Thanks for your responses. The biggest problem seems to be cast() for
the reshape package which could not handle the dataset. Peter's

s