Re: [R-sig-eco] mvpart alternatives and machine learning multivariate analysis

2017-07-01 Thread Gavin Simpson
If you want to stick with or at least compare a mulivariate tree with
other approaches (like CCA or mvabund or boral) then the partykit
package provides an implementation of a multivariate tree model, using
conditional inference trees (i.e. you use conditional inference to
decide if spits are "significant" and stop splitting when they aren't
rather than the classic build a big tree and prune it back with
cost-complexity pruning and cross-validation.

An illustration with the classic hunting spider data set (which was
used in the mvpart paper too IIRC) can be found at the end of one of
the vignettes supplied with partykit:
https://cran.r-project.org/web/packages/partykit/vignettes/ctree.pdf

When I used this recently to replicate an analysis from a book chapter
I wrote a few years back, the partykit implementation worked well
after a little fiddling, but some of the plot options for the terminal
nodes don't work so well if your species response matrix has many
columns in it (i.e. lots of species).

HTH

Gavin

On 24 June 2017 at 10:35, Manuel Spínola  wrote:
> Thank you very much Ralf.
>
> I am looking for alternatives to the classic CCA community analysis, and I
> was thinking into machine learning techniques.
>
> Manuel
>
> 2017-06-24 10:10 GMT-06:00 Ralf Schäfer :
>
>> Indeed! To expand on this: if you need a tutorial for mvabund, we once
>> analysed some categorical multivariate data and provided a tutorial:
>> http://dx.doi.org/10.1007/s10646-015-1421-0
>> Paper and tutorial are freely available on researchgate
>>
>> There are also many other methods, but to point you to some, it would be
>> good if you were more specific than „community- environment relationships“.
>>
>> Best regards
>> Ralf
>>
>>
>> Am 24.06.2017 um 18:02 schrieb Torsten Hauffe :
>>
>> As far as I remember, on Windows you will need to download and install the
>> Rtools first (https://cran.r-project.org/bin/windows/Rtools/) because
>> some parts of mvpart need to be compiled.
>>
>> You can analyse multiple species-environment relationships with the
>> mvabund package. This is not fancy machine-learning but solid likelihood
>> statistic.
>>
>> HTH,
>> Torsten
>>
>> On 24 June 2017 at 11:57, Ralf Schäfer  wrote:
>>
>>> Manuel,
>>>
>>> I just checked, it is currently still compatible. So you can download
>>> from the archive and install from source - at least on Linux and OS X, nut
>>> sure about Windows.
>>> See Session information below:
>>>
>>> > R version 3.4.0 (2017-04-21)
>>> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>> > Running under: macOS Sierra 10.12.5
>>> >
>>> > Matrix products: default
>>> > BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/
>>> libRblas.0.dylib
>>> > LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/
>>> libRlapack.dylib
>>> >
>>> > locale:
>>> > [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
>>> >
>>> > attached base packages:
>>> > [1] stats graphics  grDevices utils datasets  methods   base
>>> >
>>> > other attached packages:
>>> > [1] mvpart_1.6-2
>>> >
>>> > loaded via a namespace (and not attached):
>>> > [1] compiler_3.4.0
>>>
>>>
>>> However, there are certainly other packages that can partition
>>> multivariate ecological data, though I am not aware of other packages for
>>> use with multivariate regression trees.
>>>
>>> Regards
>>> Ralf
>>>
>>>
>>>
>>> > Am 24.06.2017 um 17:47 schrieb Manuel Spínola :
>>> >
>>> > Thank you Ralf,
>>> >
>>> > But I guess is not going to be newer versions and could be incompatible
>>> with newer version of R, so at some moment there will be no accessibility
>>> to the package.
>>> >
>>> > Manuel
>>> >
>>> > 2017-06-24 5:25 GMT-06:00 Ralf Schäfer >> >:
>>> > Dear Manuel
>>> >
>>> > despite it has been removed, it should still work.
>>> > At least I used it last year - just install the version from the
>>> archive manually:
>>> > https://cran.r-project.org/src/contrib/Archive/mvpart/ <
>>> https://cran.r-project.org/src/contrib/Archive/mvpart/>
>>> >
>>> > Best regards
>>> > Ralf
>>> >
>>> >
>>> >> Date: Fri, 23 Jun 2017 13:59:13 -0600
>>> >> From: Manuel Sp?nola > mspinol...@gmail.com>>
>>> >> To: "r-sig-ecology@r-project.org "
>>> >
>>> >> Subject: [R-sig-eco] mvpart alternatives and machine learning
>>> >>  multivariateanalysis
>>> >> Message-ID:
>>> >>  

Re: [R-sig-eco] mvpart alternatives and machine learning multivariate analysis

2017-07-01 Thread Gavin Simpson
Right, but mvabund fits quite different types of model to that fitted
in a multivariate tree, which can be seen as an unsupervised
constrained clustering of the multivariate species response matrix.

All the best

Gavin

On 24 June 2017 at 10:02, Torsten Hauffe  wrote:
> As far as I remember, on Windows you will need to download and install the
> Rtools first (https://cran.r-project.org/bin/windows/Rtools/) because some
> parts of mvpart need to be compiled.
>
> You can analyse multiple species-environment relationships with the mvabund
> package. This is not fancy machine-learning but solid likelihood statistic.
>
> HTH,
> Torsten
>
> On 24 June 2017 at 11:57, Ralf Schäfer  wrote:
>
>> Manuel,
>>
>> I just checked, it is currently still compatible. So you can download from
>> the archive and install from source - at least on Linux and OS X, nut sure
>> about Windows.
>> See Session information below:
>>
>> > R version 3.4.0 (2017-04-21)
>> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
>> > Running under: macOS Sierra 10.12.5
>> >
>> > Matrix products: default
>> > BLAS: /Library/Frameworks/R.framework/Versions/3.4/
>> Resources/lib/libRblas.0.dylib
>> > LAPACK: /Library/Frameworks/R.framework/Versions/3.4/
>> Resources/lib/libRlapack.dylib
>> >
>> > locale:
>> > [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
>> >
>> > attached base packages:
>> > [1] stats graphics  grDevices utils datasets  methods   base
>> >
>> > other attached packages:
>> > [1] mvpart_1.6-2
>> >
>> > loaded via a namespace (and not attached):
>> > [1] compiler_3.4.0
>>
>>
>> However, there are certainly other packages that can partition
>> multivariate ecological data, though I am not aware of other packages for
>> use with multivariate regression trees.
>>
>> Regards
>> Ralf
>>
>>
>>
>> > Am 24.06.2017 um 17:47 schrieb Manuel Spínola :
>> >
>> > Thank you Ralf,
>> >
>> > But I guess is not going to be newer versions and could be incompatible
>> with newer version of R, so at some moment there will be no accessibility
>> to the package.
>> >
>> > Manuel
>> >
>> > 2017-06-24 5:25 GMT-06:00 Ralf Schäfer > >:
>> > Dear Manuel
>> >
>> > despite it has been removed, it should still work.
>> > At least I used it last year - just install the version from the archive
>> manually:
>> > https://cran.r-project.org/src/contrib/Archive/mvpart/ <
>> https://cran.r-project.org/src/contrib/Archive/mvpart/>
>> >
>> > Best regards
>> > Ralf
>> >
>> >
>> >> Date: Fri, 23 Jun 2017 13:59:13 -0600
>> >> From: Manuel Sp?nola > >>
>> >> To: "r-sig-ecology@r-project.org "
>> >
>> >> Subject: [R-sig-eco] mvpart alternatives and machine learning
>> >>  multivariateanalysis
>> >> Message-ID:
>> >>  

Re: [R-sig-eco] accounting for linear sampling structure using PERMANOVA or dbRDA

2017-01-19 Thread Gavin Simpson
Hi Tim,

It sounds like you'd be best served by modelling the transect spatial
position and using free permutations. Spatial eigenvectors could be
used for example to model the transect position effect if you need a
more complex effect than a simple linear or polynomial function.

HTH

G

On 13 January 2017 at 22:53, Tim O'Connor <t...@berkeley.edu> wrote:
> Hello everyone,
>
> I’m trying to assess the effect of a factor on community structure while 
> controlling for confounded spatial effects.
>
> I sampled herbivorous insect communities along transects spanning a contact 
> zone between two host plants, A and B, with 6 sites per transect (e.g., 
> start-A-A-A-B-B-B-end) and 10 plants per site. Each plant was censused for 
> insects separately, so I began with 60 total communities per transect. The 
> transects are basically linear, but sites are irregularly spaced. I would 
> like to quantify the effect of plant type on insect community while 
> controlling for possible environmental or spatial effects due to transect 
> position.
>
> At the moment I attempt this with a PERMANOVA (or equivalently, dbRDA), 
> permuting plant type among sites and finding the marginal effect of plant in 
> a model that includes position.
>
> ctrl <- how(complete = T,
> within = Within(type = "none"),
> plots = Plots(type = “free", strata = site))
> perms <- shuffleSet(nobs(communities), control = ctrl)
> adonis2(vegdist(communities) ~ position + plant, permutations = perms, by = 
> “margin")
>
> Although a linear permutation scheme seems most justified and would account 
> for the adjacency of sites, there are only 5 such permutations for a transect 
> of 6 sites (aside from the observed arrangement). Allowing free permutation 
> of sites improves total permutations (up to 719) but no longer includes 
> spatial information.
>
> I have two questions. First, is this approach correct in principle? Does it 
> seem overly or underly conservative? Second, are there other approaches I 
> should consider, especially those that allow uneven observations among sites? 
> The challenge with the method I describe is that my final data set includes 
> different numbers of plants per site due to data cleaning. The constrained 
> permutations require me to sacrifice at least 1/3 of my cleaned data to 
> ensure the same number of observations per site.
>
> Thanks for any suggestions.
>
> Best,
> Tim
>
> ---
> Tim O'Connor
> PhD Student
> Whiteman Laboratory
> Integrative Biology
> University of California, Berkeley
> http://noahwhiteman.org/tim-oconnor.html
>
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] RDA

2015-11-26 Thread Gavin Simpson
Yes, that is the appropriate citation for the software/implementation (and
in the case of capscale() the method, I think; there are other variants of
the idea around). `adonis()` is based on the permutational MANOVA
(PERMANOVA) references to the papers that introduced this method are in
`?adonis`, which you may choose to cite also.

HTH

Gavin

On 26 November 2015 at 14:13, Richards, Christina <c...@usf.edu> wrote:

> Hi Dr. Oksanen!
>
> Its very gracious of you to respond. On another important note, this is
> the citation suggested for the vegan package:
>
> Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O'Hara RB,
> Simpson GL, Solymos P, Stevens MHH, Wagner H. (2015) vegan: Community
> Ecology Package. R package version 2.2-1.
> http://CRAN.R-project.org/package=vegan
>
> Is this the most appropriate for our purposes (using Adonis and capscale
> in Vegan)?
>
> Christina Richards, Ph.D.
> University of South Florida
> Department of Integrative Biology
> 4202 East Fowler Avenue SCA 127
> NES 107 (shipping)
> Tampa, FL 33620
> (813)974-5090
> (813)974-3263 FAX
> http://www.ecologicalepigenetics.com
> Twitter: @EcolEpig
> Facebook: Ecological Epigenetics
> 
> From: Jari Oksanen <jari.oksa...@oulu.fi>
> Sent: Thursday, November 26, 2015 1:15 PM
> To: Marcelino de la Cruz
> Cc: Richards, Christina; r-sig-ecology@r-project.org; Robertson, Marta;
> Verhoeven,  Koen; Alvarez, Mariano; Foust, Christy
> Subject: Re: [R-sig-eco] RDA
>
> > On 26 Nov 2015, at 20:05 pm, Marcelino de la Cruz <
> marcelino.delac...@upm.es> wrote:
> >
> > El 26/11/2015 a las 16:27, Richards, Christina escribió:
> >> Hello!
> >>
> >> That is very helpful and seems to work! Thank you!!
> >>
> >> I did not realize we could use raw data in capscale,
> > We could and we should! Currently, it is the only way to achieve it.
> >
> > is this true only because it is conditioning variables?
> > It seems that the current implementation of capscale only accepts a
> single *data.frame* for both explanatory and conditioning variables
> >
> Yes, this is true: current and *future* implementations of
> capscale/rda/cca (and future dbrda) will only have one data= argument.
> However, in addition to variables in the data frame given in data=, you can
> mix variables in the work environment in your formula. If you think you
> need to have several data frames for the data= argument, please consider
> cbind().
>
> cheers, Jari Oksanen
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] glm(binomial) vs. logistf

2015-10-29 Thread Gavin Simpson
If it is Firth's procedure that you are after, the **brglm** package does
that and has most if not all of the standard methods for models, including
a `predict()` method.

You might also wish to consider the **arm** package and its `bayesglm()`
function, which employs different priors that also handle the separation
issue in binomial GLMs. The reference cited in `?arm::bayesglm` has some
discussion of this.

HTH

G

On 29 October 2015 at 14:45, Drew Tyre <aty...@unl.edu> wrote:

> After just a quick look I think one reason is that objects created with
> logistf() don't have as many methods for them. For example, I frequently
> use the predict() method with fitted models, and there is no predict method
> for logistf fits. Doesn't mean there couldn't be, but the code hasn't been
> written yet.
>
> --
> Drew Tyre
>
> School of Natural Resources
> University of Nebraska-Lincoln
> 416 Hardin Hall, East Campus
> 3310 Holdrege Street
> Lincoln, NE 68583-0974
>
> phone: +1 402 472 4054
> fax: +1 402 472 2946
> email: aty...@unl.edu
> http://snr.unl.edu/tyre
> http://aminpractice.blogspot.com
> http://www.flickr.com/photos/atiretoo
>
> -Original Message-
> From: R-sig-ecology [mailto:r-sig-ecology-boun...@r-project.org] On
> Behalf Of Martin Weiser
> Sent: Thursday, October 29, 2015 2:11 PM
> To: r-sig-ecology@r-project.org
> Subject: [R-sig-eco] glm(binomial) vs. logistf
>
> Dear friends,
>
> Is there any reason why to run logistic regression (binomial response) by
> glm() and not by logistf() by default? In particular when having sparse
> data (e.g. 8 presences in 100  samples), frequently with quasi-separation
> (all presences at one level of the predictor, together with many absences).
>
> I tried to read some papers by G. Heinze - I did not get the whole thing,
> but it seems to me that both terms estimation and testing procedure should
> be more reliable using logistf(). Am I wrong?
>
> So, is there any reason why to use binomial glm?
> I am sorry for my ignorance - there should be a reason why people stick to
> glm() - I just do not know what it is. Could you explain it to me or point
> me to something to read, please? I am not a statistician by training,
> however.
>
> Thank you for your patience.
>
> Kind regards,
> Martin W.
>
>
>
>
>
> --
>
> --
> Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta
> Univerzity Karlovy v Praze:
> a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu,
> b) stanovuje, že smlouva musí mít písemnou formu,
> c) vylučuje přijetí nabídky s dodatkem či odchylkou,
> d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na
> všech náležitostech smlouvy.
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] cross validated CCA

2015-08-24 Thread Gavin Simpson
Hi Jesse,

I have most if not all the code from the ter Braak  Schaffers paper ported
to R ins some state or other. Steve Juggins sent me the CCA-PLS port IIRC
but I've been distracted with other things and this method is not in
cocorresp (yet! but I've been saying that for years :-( ). We don't have
anything in Vegan that will do CV for you.

It wouldn't be the most efficient of codes but it would be relatively
simple to do CV by hand with Vegan. IIRC the CV for CoCA is leave-one-out,
so you just need to loop over the rows, fit the cca to all the data minus
the ith row and then predict for the ith row and compare with the observed.

I've just gotten back from a few weeks out of the office so I'm going to be
tied up for a wee while clearing the backlog but if you can't get this
working get back in touch and I'll see if I can flesh these ideas out a bit
more in code for you.

All the best

G

On 14 August 2015 at 16:13, Jesse Becker jcbecke...@gmail.com wrote:

 A couple of years ago I was doing some Co-Correspondence analysis and asked
 if the cross validation method in ter Braak and Shaffers (2004) which is
 available in the cocoresp package had been implemented to work on a CCA
 output from vegan? I ended up not needing it for that project, but I'm
 using the method for another paper.  In an effort to put CoCA and CCA on
 the same footing in terms of evaluating fit, I'd like to use the cross
 validation on CCA.  I don't have access to MATLAB, I do have access to R
 (and am a user, not a programmer)

 So has anyone adapted the MATLAB code to R?  Gavin?

 Thanks in advance!

 Jesse C. Becker, Ph.D.
 765.285.8889 office
 512.587.4428 cell
 jcbecker at bsu.edu
 jcbecker42 at gmail.com

 Call
 Send SMS
 Add to Skype
 You'll need Skype CreditFree via Skype

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] biplot question: Error in 1L:n : argument of length 0

2015-07-29 Thread Gavin Simpson
Use the `plot()` method provided by vegan. You won't get biplots from an
NMDS as the method throws away the species data to when convert the data to
dissimilarities before generating the k-d mapping. vegan allows species
scores to be added (as the weighted averages of the site scores) if the raw
data was given to metaMDS(), but these and the site scores don't form a
traditional biplot if you mean that term specifically.

G

On 29 July 2015 at 10:36, Michael Marsh sw...@blarg.net wrote:

 I am trying to obtain biplots of NMDS results, ideally like the
 rpart.pca() result with mvpart.
 Can someone easily tell me why I get this error:
 Error in 1L:n : argument of length 0
 from this script, and graphical output (below) lacking labels for either
 sites or species?
 x and y matrices are shown below script.
 modifying y by removing NaN rows did not change output.
 Thanks!,
 Mike Marsh
 sw...@blarg.net

 --
 Q.wd-as.data.frame(read.table(file.choose(),header=T))
  #text dataset is Q.WD.foliar.revised.txt

  library(vegan)
  Q09shrub.min-vegtab(Q09shrub,min=2)
  Q09shrub.std-decostand(Q09shrub.min, method=max)
  Q09shrub.dist- dist(Q09shrub.std)
  Q09shrub.ward-hclust(Q09shrub.dist,method=ward,members=NULL)
   NMDS.Q09shrub-metaMDS(Q09shrub)

 Wisconsin double standardization
 Run 0 stress 0.08176961
 Run 1 stress 0.1011811
 Run 2 stress 0.07507627
 ... New best solution
 ... procrustes: rmse 0.04824229  max resid 0.1222105
 Run 3 stress 0.1011877
 Run 4 stress 0.07507602
 ... New best solution
 ... procrustes: rmse 0.0001471213  max resid 0.0002974951
 *** Solution reached

  x-plot(NMDS.Q09shrub, sites)
  y-plot(NMDS.Q09shrub, species)
  biplot(x,y)
 Error in 1L:n : argument of length 0
  ordicluster(NMDS.Q09shrub, Q09shrub.ward)
 
  x
 $sites
  NMDS1   NMDS2
 9Q05-ST -0.7781618 -0.05087985
 9Q06-VS  0.6462465 -0.41377399
 9Q12-ST -0.3126179 -0.80107514
 9Q15ECS -0.8780461  0.20632944
 9Q15WST -0.7672079  0.8850
 9Q16-VS  0.8652229  0.56684307
 9Q19-ST  0.1374252 -0.41322443
 9Q24-ST  0.3140148 -0.12115606
 9Q26-ST -1.0044059 -0.15339828
 9Q29-DS  0.9608797  0.35394121
 9Q33-VS  0.8166505 -0.06138349

 $species
 NULL

 attr(,class)
 [1] ordiplot
  y
 $sites
 NULL

 $species
  NMDS1   NMDS2
 ACMA3  NaN NaN
 AMAL2  -0.05672919 -0.81171544
 ARRI2   0.84801299  0.24651319
 ARTR2  -0.93597835  0.02417604
 ARTR4  -1.01337771 -0.05343535
 GRSP   NaN NaN
 CHVI8  -0.74187745 -0.02949452
 ERBL2  NaN NaN
 ERNA10 -0.61157997  0.47464632
 NEST5   0.87499085 -0.04070596
 PREM   -0.90401815  1.25730322
 PRVI   NaN NaN
 PUTR2   0.16079095 -0.36676073
 RICE   -0.35775047 -0.46386377
 ROWO   NaN NaN
 SYAL2  NaN NaN
 SYOR2  -0.36836460 -1.13451214

 attr(,class)
 [1] ordiplot
 

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] distances in NMDS ordination space

2015-07-16 Thread Gavin Simpson
Hi Kate,

The Euclidean distances between points in the NMDS ordination are an
approximation to the rank ordering of the original distances. Hence I would
consider whether the (approximate) rank ordering of the original distances
is the correct metric for the thing you want to include in your model. You
would also need to consider the stress of the solution, the error in the
mapping.

I'm not convinced that NMDS distances are better than embedding the
original distances in a Euclidean space using PCoA. Each has difficulties
(ranks vs imaginary eigenvalues).

HTH

Gavin

On 16 July 2015 at 13:19, Kate Boersma kateboer...@gmail.com wrote:

 Hi all.

 I have a methodological question regarding non-metric multidimensional
 scaling. This is not specific to R. Feel free to refer me to another
 venue/resource if there is one more appropriate to my question.

 Correct me if I'm wrong: NMDS axes are non-metric, which is why NMDS
 frequently makes sense for community data, but it also means that distances
 in NMDS ordination space cannot be interpreted simplistically as they can
 in eigenvalue-based methods like PCA. This is why it is inadvisable
 (meaningless) to use NMDS axes as response variables in a linear modeling
 framework (e.g., with environmental variables as predictors).

 My question is this: Does that mean that it is also inadvisable to use
 distances among points in ordination space as response variables?

 My (potentially flawed) understanding: While the coordinates may not make
 sense in isolation, they should be meaningful relative to each other. In a
 2D ordination, if communities A  B are closer together in ordination space
 than communities C  D, that means they have more similar species
 compositions. Therefore, I should be able to predict the distance between
 points in a linear modeling framework.

 Alternately, I could use the actual distances among communities from my
 dissimilarity matrix with a method like db-RDA. But I used NMDS over RDA or
 CCA for a reason. It seems more straightforward to use the distances from
 my NMDS ordination instead of generating new coordinates from a PCoA to fit
 an RDA framework (as in db-RDA)... but this logic only works if NMDS
 distances are informative.

 Are these comparable analyses? If not, why not?

 I'd love your opinions.

 Thank you,
 Kate

 --
 Kate Boersma, PhD
 Department of Biology
 University of San Diego
 5998 Alcala Park
 San Diego CA 92110
 kateboer...@gmail.com
 http://www.oregonstate.edu/~boersmak/

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Building glmms to handle zero-inflated continuous data in R - what options are available? (especially relating to hurdle/mixture models)

2015-06-19 Thread Gavin Simpson
How complex are the random effects? I they are relatively simple, give the
mgcv package a look. Its gam() function can fit Tweedie models optimising
over the Tweedie parameter too, and you can include random effects via
splines using `bs = re`.

G

On 17 June 2015 at 15:57, Karan Odom kjo...@gmail.com wrote:

 Hi,

 I have a zero-inflated continuous data set and want to build a glmm in R to
 analyze it (I have both fixed and random effects). However, because my data
 are continuous, I am discovering that this is not a simple task.
 Zero-inflation options in glmmABMD are not appropriate because my data are
 continuous and I don't know what other packages exist that allow for
 zero-inflated glmms with continuous data.

 I tried implementing the Tweedie distribution using packages tweedie and
 cplm, but these are a poor fit to my data.

 I think hurdle or mixture models might be especially useful for my data.
 When I modeled the non-zero continuous data separately from the
 zero/non-zero data, I get a very good fit to the data. However, I am stuck
 at how to integrate the two models. There seem to be packages in R that do
 this for count data but I have not found them for continuous data.

 I have been reading previous r-sig-ecology posts about this and find a lot
 of information from 2008-2012. I was wondering in the last few years if
 there have been developments in and if there are now available: (1)
 packages or techniques for easily implementing glmms for zero-inflated data
 in R, and (2) are there any good packages for mixture or hurdle models in R
 that allow for continuous data (i.e., how can I integrate the two models
 for the zero/non-zero versus non-zero continuous data)?

 Thank you very much for any help!
 Karan

 --
 Karan J. Odom
 Ph.D. Candidate, Biological Sciences
 University of Maryland, Baltimore County
 1000 Hilltop Circle
 Baltimore, MD 21250

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Partitioning spatial effects using trend surface analysis or PCNM

2015-05-05 Thread Gavin Simpson
Hi trichter

On 5 May 2015 at 13:34, trichter trich...@uni-bremen.de wrote:
snip /

 Here is what i do:

 spat - as.data.frame(poly(as.matrix(spatxy), degree=3))

 cca1_s - cca(OTU~., data=spat)
 #significances
 anova(cca1_s)
 anova(cca1_s, by=term, perm=999)


Don't think last analysis makes much sense; if you have a cubic polynomials
plus interactions you should only consider the interactions first for
removal, then decide if quadratic rather than cubic are needed



 #forward selection for most parsimonious model
 cca1_s.f - ordistep(cca(OTI~1, data=spat), scope=formula(cca1_s),
 direction=forward, pstep=1000)
 sig1_s.f - anova(cca1_s.f, by=term, perm=999)


Again, as above, you have to be very careful with this. Just because you
made a matrix with 9 covariates it doesn't mean it makes sense to cherry
pick from these terms.


 The result is a significant CCA object. Spat is usuable in VarPart and
 yields a low but significant value for overall autocorrelation.

 For PCNM i do

 rs - rowSums(OTU)/sum(OTU)
 pcnmw - pcnm(dist(spatxy), w = rs)
 cca1_pcnm - cca(acido1 ~ scores(pcnmw))

 pcnmw consists of 250 vectors, and the result is a non-significant CCA
 object, where i expected a finer spatial decomposition.


You are supposed to choose from among the set of PCNMs which explain the
species data best, not use them all in the model. The problem appears to be
that you have a model that is far too complex with lots of redundant axes
(or more likely too few constraints).

One suggestion is to use only those PCNMs that have positive spatial
correlation. Compute that using Moran's I of which there are a few
implementations around in various R packages. You can do CCA analysis with
the positive spatial correlation PCNMs separately from the negatively
correlated PCNMs if you wish.

You will probably need to do some type of forward selection but the
preferred method seems to be limited to RDA (because the adjusted R2
measure used in the global significance test isn't worked out for CCA). If
you skip the global test, you could just do forward selection on the
positive PCNMs, but you probably want to try to control for accepting too
many PCNMs by having low entry threshold for significance.

HTH

G



 The same is true if i am using total count data (hellinger transformed or
 not).

 I am sure i am doing it wrong, so if you have advise to properly do the
 calculation, please let me know. Thank you for the help.





 --
 View this message in context:
 http://r-sig-ecology.471788.n2.nabble.com/Partitioning-spatial-effects-using-trend-surface-analysis-or-PCNM-tp7579427.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Using multiple species data for gam

2015-02-10 Thread Gavin Simpson
mvabund has a manyany() function which allows you to run the same sort of
analysis as manyglm() does without having to use a GLM. Hence you could do
a many GAM using manyany() and the mgcv::gam() function(ality). There is an
example of this on the ?manany help page.

Still, doing this for a 1000 species is going to be tough going, even if
you just used manyglm() but it may be doable if you are prepared to wait
for the models to fit and you have sufficient data in each species to fit a
complex model like a GAM.

G

On 10 February 2015 at 10:28, Tim Meehan tme...@gmail.com wrote:

 If you want to do this in a glm framework, you might look into the mvabund
 package:

 http://cran.r-project.org/web/packages/mvabund/mvabund.pdf

 I've never used it with anything approaching 1000 species, though.

 On Tue, Feb 10, 2015 at 2:41 AM, Rajendra Mohan panda 
 rmp.iit@gmail.com
  wrote:

  Dear All
 
  I have 1000 species with presence and absence (0 or 1) values and with
  seven corresponding predictor variables. If I can run gam/glm for the
 data
  using all species data simultaneously vs predictors. Data are arranged in
  columns against their GPS locations (see below). I know it is possible to
  do separately for each species.
 
  Your kind response is highly appreciated.
 
  Sites  Sp1  Sp2 Sp3 Alt Temp Pptn   Ft
  1A 0  11 20   30 1000 Evergreen
 
  With Best Regards
  Rajendra M Panda
  School of Water Resources
  Indian Institute of Technology Kharagpur, India
 
  [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Using multiple species data for gam

2015-02-10 Thread Gavin Simpson
I doubt VGAM's CAO will be able to handle such data, the computational
burden is just huge under that modelling framework.

The old way would be to just do CCA (Canonical Correspondence Analysis)
on those data, though I may get told off by David Warton for suggesting it
:-)

G

On 10 February 2015 at 03:53, Eduard Szöcs szo...@uni-landau.de wrote:

 Dear Rajendra,

 your post reminds me on constrained additive ordination [1].

 [1] Yee, T. W. (2006) Constrained additive ordination. Ecology, 87,
 203–213.

 All the best,

 Eduard Szöcs

 On 10/02/15 10:41, Rajendra Mohan panda wrote:
  Dear All
 
  I have 1000 species with presence and absence (0 or 1) values and with
  seven corresponding predictor variables. If I can run gam/glm for the
 data
  using all species data simultaneously vs predictors. Data are arranged in
  columns against their GPS locations (see below). I know it is possible to
  do separately for each species.
 
  Your kind response is highly appreciated.
 
  Sites  Sp1  Sp2 Sp3 Alt Temp Pptn   Ft
  1A 0  11 20   30 1000 Evergreen
 
  With Best Regards
  Rajendra M Panda
  School of Water Resources
  Indian Institute of Technology Kharagpur, India
 
[[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

 --
 Eduard Szöcs
 Quantitative Landscape Ecology
 Institute for Environmental Sciences
 University Koblenz-Landau
 Tel. +49 6341 280 31552

 http://www.uni-koblenz-landau.de/campus-landau/faculty7/environmental-sciences/landscape-ecology/Staff/eduardszoecs

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] GAM - Cyclic splines

2015-01-22 Thread Gavin Simpson
To the best of my knowledge this works as per mgcv::gam and mgcv::gamm (as
the only code in **gamm4** is that to fit the GAMM via **lme4** and nothing
else, so leveraging existing code in **mgcv**).

Whilst it would help to give more informative names than X1 and X2, and so
on to your variables, I think the following should work, for the `bs =
cc` term:

Model - gamm4(Y ~ s(X1, k=5, bs=cc) + s(X2, bs=cr) + offset(log(X3)),
  random = ~(1 | X4), data = data, family = poisson,
  knots = list(X1 = c(1, 366)))

Not sure why you want c(0, 366) as wouldn't that be 367 days?

Anyway, you pass knots a list with components named as per the variables
used in the smooth terms. If you don't supply knots for a smooth term, the
default locations are used. In the case of cyclic cubic splines, one can
simply get by with passing the end points of the knots as I did above, and
the remaining knots (as determined by k) are distributed evenly across the
interior of the stated two boundary knots.

I don't see why, if suitably fitted, predictions for day 1 or day 366
should differ markedly. Were you predicting at the same values of X2, X3,
and X4 when you compared the outputs?

HTH

G

On 22 January 2015 at 13:40, SamiC samantha@plymouth.ac.uk wrote:

 Hi,

 I am trying to use cyclic splines in gamm4 to model year day.  I have data
 running from say day 6 to day 300 but I want to specify the outer limits of
 the data are at 0 and 366.  I dont want to fix the inner position of the
 knots however (I allow for a total of 5 knots).  Is there a way to do this?
 I also have other explanatory variables in the model that are not cyclic.

 Example:
 Model-gamm4(Y~s(X1,fx=FALSE,k=5,bs=cc)+s(X2,fx=FALSE, k=5,
 bs='cr')+offset(log(X3)),random=~(1|X4),data=data,family=poisson)

 Finally when I try to plot the predictions on the response scale the values
 predicted at 0 and 366 are vastly different and do not match, as when
 outputs using plot(Model$gam,se=TRUE).

 Any help on this would be much appreciated.

 Cheers

 Sam



 --
 View this message in context:
 http://r-sig-ecology.471788.n2.nabble.com/GAM-Cyclic-splines-tp7579269.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Help in code for fitting a growth model in R

2014-11-16 Thread Gavin Simpson
You *can't* send docx files as attachments to this list; most attachments
get stripped. Consider doing something useful with the content of the docx
and put the content as plain text into an email and send that.

HTH

G

On 16 November 2014 05:46, Chandrasekhar Rudrappa chandr...@gmail.com
wrote:

 I have to fit a model to growth data of Hevea (rubber) trees. The details
 are outlined in the attached docx file.  Kind help is solicited.
 --
 Dr. TR Chandrasekhar, M.Sc., M. Tech., Ph. D.,
 Sr. Scientist
 Rubber Research Institute of India
 Hevea Breeding Sub Station
 Kadaba - 574 221
 DK Dt., Karnataka
 Phone-Land: 08251-214336
 Mobile: 9448780118

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] envfit permutation significance

2014-11-10 Thread Gavin Simpson
Well, just because the vector(s) in question don't hit some a priori
specified statistical significance doesn't mean the vector doesn't exist;
we can still plot it.

As for interpretation, the length of the vector is not unusual when
considered against a null distribution of r values generated by
permutation. This doesn't mean the vector is random, just that it is no
more important than a vector in ordination space where there was no
relationship between the axis scores and the variable for which the vector
is being tested.

You need to ask yourself whether the permutation test is valid - are you
allowed to permute the things you permuted? - and whether you have
sufficient power to detect an effect, which usually boils down to having
enough unique permutations to get a significant p value.

If you want to restrict the plot to only those vectors that are
significant, then look at the `p.max` argument in `?plot.envfit

HTH

Gavin

On 10 November 2014 22:08, jsgro js...@jcu.edu wrote:

 I have fitted a vector (a variable) using envfit to an RDA of sites,
 species,
 and environmental variables. The envfit permutation (999 permutations) was
 not significant, but envfit placed the vector on the RDA plot. How exactly
 do I interpret this plot? Is the placement of the vector meaningless (just
 random) because the permutation result was not significant? or does the
 vector placement have meaning, but it just cannot be predicted from the RDA
 with confidence?



 --
 View this message in context:
 http://r-sig-ecology.471788.n2.nabble.com/envfit-permutation-significance-tp7579206.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Logistic regression with 2 categorical predictors

2014-10-24 Thread Gavin Simpson
Hi Andrew,

On 24 October 2014 01:41, Andrew Halford andrew.half...@gmail.com wrote:

 Dear Gavin,

 Firstly let me say that I take offence at your bogus comment. Just
 because I, like many others who interact on this list, often struggle
 conceptually with the overwhelming analysis choices that are required in
 our line of work doesn't give you the right to drop snide remarks as you
 see fit.


Sorry about that; I meant bogus in the sense of its synonym; spurious. I
overlooked that it also had other synonyms could could easily be
interpreted as you have, as me suggesting something nefarious was being
done. That was *not* my intention I do apologise for that as I have clearly
caused some offence where none was intended.

So perhaps I should have said, the results you show are spurious; don't
both interpreting the model because you are overfitting the data - in fact
you are fitting it perfectly (to within numeric precision anyway).


 My line of query is ALWAYS genuine from my perspective and I don't expect
 you or anyone else to belittle people on the public list!

 As it turns out my issues are not resolved.

 To recap..

 I have run a bunch of choice chamber experiments with larval fish.
 Graphing up the ratio of 0/1 choices produces a plot which shows to my eye,
 evidence of a result for some of the tests, with fish appearing to make
 defined choices in the later age groups for 2 of the tests.

 What appears to be happening is that because there are some empty cells in
 the later age x test interactions (the fish only took one option to the
 exclusion of the other) the errors are way out and hence preclude any
 chance of getting a significant result. If I add a single result to any of
 the zero cells to remove the blank the analysis actually works more as I
 hoped. However I doubt this is acceptable so I am hoping to get some help
 with producing an effective analysis without having to manipulate the blank
 cells.


I'm confused; how would you go about manipulating blank cells? If you
have more data use it - you certainly can't fit the full model (or two main
effects and their interaction) with the data you currently have. The model
is saturated in that you've fitted as many coefficients as there are data
points; you've replaced the existing response data with a vector of the
same length containing the estimates of the coefficients from the model. In
a sense, you've just transformed your response through a complex procedure.
Nothing else.

As such, you have no basis for then interpreting the coefficients or doing
pairwise comparisons. The model you are fitting is just too complex.

This can happen in experiments where there is sufficient replication of the
levels of the factors and there combinations. Which seems to be what has
happened here because of those darned stubborn fish.

I hope I've done a better of job of i) not annoying you with poorly chosen
words, and ii) explaining why I think you should stop with the current
model as is it is too complex for your data. You can;t test for an
interaction but you could remove the interaction and just test for main
effects, unless you can get some more data.

G




 Andrew




 On 24 October 2014 04:08, Gavin Simpson ucfa...@gmail.com wrote:

 This all looks bogus to me; you've fit the data perfectly by fitting a
 saturated model - there are no residual degrees of freedom and
 (effectively) zero residual deviance. Things are clearly amiss because you
 have huge standard errors. You have 24 data points and fit a model with 23
 coefficient plus the intercept; you just replaced your data with 24 new
 data points (the values in the Estimate column of the summary() output)

 I really wouldn't bother interpreting it any further.

 HTH

 G

 On 21 October 2014 18:21, Andrew Halford andrew.half...@gmail.com
 wrote:

 Hi Thierry,

 The multiple comparisons ran just fine but there was a ridiculous amount
 of
 interaction combinations all of which were non-significant even though
 there was a highly significant interaction term. I decided to remove test
 as a variable to simplify the analysis and run separate single
 explanatory
 variable logistic regressions. I have included a result below which is
 still producing an outcome I cant explain. Namely, why am I getting such
 a
 significant result for the ANOVA but when I do the tukey tests nothing is
 significant?

  sg_habitat
   Age Prefer Avoid
 1   1 1714
 2   2 2010
 3   3 14 9
 4   4 1312
 5   5  018
 6   6  0 5

  model_sg - glm(cbind(Prefer,Avoid) ~ Age, data=sg_habitat,
 family=binomial)

  anova(model_sg, test=Chisq)

 Analysis of Deviance Table

 Model: binomial, link: logit

 Response: cbind(Prefer, Avoid)

 Terms added sequentially (first to last)


  Df Deviance Resid. Df Resid. Dev  Pr(Chi)
 NULL 5 36.588
 Age   5   36.588 0  0.000 7.243e-07 ***


  mc_sg - glht(model_sg, mcp(Age = Tukey))

  summary(mc_sg

Re: [R-sig-eco] Logistic regression with 2 categorical predictors

2014-10-23 Thread Gavin Simpson
  age5:testSW  1.678e+00  9.386e-01   1.788   0.0737 .
  age6:testSW  2.626e+01  9.348e+04   0.000   0.9998
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  (Dispersion parameter for binomial family taken to be 1)
 
  Null deviance: 5.4908e+01  on 23  degrees of freedom Residual
  deviance: 2.6113e-10  on  0  degrees of freedom
  AIC: 122.73
 
  Number of Fisher Scoring iterations: 23
 
 
   anova(out2, test=Chisq)
 
  Analysis of Deviance Table
 
  Model: binomial, link: logit
 
  Response: cbind(prefer, avoid)
 
  Terms added sequentially (first to last)
 
 
   Df Deviance Resid. Df Resid. Dev  Pr(Chi)
  NULL23 54.908
  age   5   11.23518 43.673 0.0469115 *
  test  31.59315 42.079 0.6608887
  age:test 15   42.079 0  0.000 0.0002185 ***
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  cheers
 
  Andy
  [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
  Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
  weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
  bevestigd is door een geldig ondertekend document.
  The views expressed in this message and any annex are purely those of the
  writer and may not be regarded as stating an official position of INBO,
 as
  long as the message is not confirmed by a duly signed document.
 
 
 
  --
  Andrew Halford Ph.D
  Research Scientist (Kimberley Marine Parks)|  Adjunct Research Scientist
  (Curtin University)
  Dept. Parks and Wildlife
  Western Australia
 
  Ph: +61 8 9219 9795
  Mobile: +61 (0) 468 419 473
  * * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
  Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver
  weer en binden het INBO onder geen enkel beding, zolang dit bericht niet
  bevestigd is door een geldig ondertekend document.
  The views expressed in this message and any annex are purely those of the
  writer and may not be regarded as stating an official position of INBO,
 as
  long as the message is not confirmed by a duly signed document.
 



 --
 Andrew Halford Ph.D
 Research Scientist (Kimberley Marine Parks)|  Adjunct Research Scientist
 (Curtin University)
 Dept. Parks and Wildlife
 Western Australia

 Ph: +61 8 9219 9795
 Mobile: +61 (0) 468 419 473

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-23 Thread Gavin Simpson
I think there are actually 4 data points per level of some factor (after
seeing some of the other no-threaded emails - why can't people use emails
that preserve threads?**); but yes, either way this is a small data set and
trying to decide if residuals are normal or not is going to be nigh on
impossible.

I like the suggestion that someone made to actually do some simulation to
work out whether you have any power to detect an effect of a given size;
seems pointless doing the analysis if you conclusions would be well, I
didn't detect an effect, but I have no power so I don't even know if I
should have been able to detect an effect if one were present. You'd be in
no worse off a position then than if you hadn't run the analysis or
collected the data.

G

** He says, hoping to heck that GMail preserves the threading information...

On 23 October 2014 14:00, Jari Oksanen jari.oksa...@oulu.fi wrote:


 On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:

  On 22 October 2014 17:24, Chris Howden ch...@trickysolutions.com.au
 wrote:
 
  A good place to start is by looking at your residuals  to determine if
  the normality assumptions are being met, if not then some form of glm
  that correctly models the residuals or a non parametric method should
  be used.
 
 
  Doing that could be very tricky indeed; I defy anyone, without knowledge
 of
  how the data were generated, to detect departures from normality in such
 a
  small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I
 mean.
 
  Second, one usually considers the distribution of the response when
 fitting
  a GLM, not decide if residuals from an LM are non-Gaussian then move on.
  The decision to use the GLM should be motivated directly from the data
 and
  question to hand. Perhaps sometimes we can get away with fitting the LM,
  but that usually involves some thought, in which case one has probably
  already thought about the GLM as well.

 I agree completely with Gavin. If you have four data points and fit a
 two-parameter linear model and in addition select a one-parameter
 exponential family distribution (as implied in selecting a GLM family) you
 don't have many degrees of freedom left. I don't think you get such models
 accepted in many journals. Forget the regression and get more data. Some
 people suggested here that an acceptable model could be possible if your
 data points are not single observations but means from several
 observations. That is true: then you can proceed, but consult a
 statistician on the way to proceed.

 Cheers, Jari Oksanen




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] EstimateR and Chao 1 standard deviation

2014-10-17 Thread Gavin Simpson
On 17 October 2014 06:44, Bob O'Hara boh...@senckenberg.de wrote:

 On 17/10/14 13:54, José M. Blanco Moreno wrote:

 Dear users

 I have been checking the values returned by the function estimateR
 against the formulae in the appendix to nonparametric estimators of
 species richness in EstimateS
 (http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/
 EstimateSUsersGuide.htm#AppendixB)...
 and they do not match each other.

 Where does this line of code in vegan:::estimateR.default come from?

 sd.Chao1 - sqrt(a[2] * ((G^4)/4 + G^3 + (G^2)/2))

 It is from any other reference that I should have under control?

  Unless Jari added something, the functions come from my paper from about
 10 years ago:
 O’Hara RB (2005) Species richness estimators: how many species can dance
 on the head of a pin? J Anim Ecol 74: 375–386.
 http://doi.wiley.com/10./j.1365-2656.2005.00940.x, which refers to
 Chao (1987), which is cited in the estimateR documentation. EstimateS now
 uses a small sample correction: http://viceroy.eeb.uconn.edu/
 EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#
 Chao1AndChao2
 Does this account for the discrepancy?

 Bob


Yep, vegan does not include the small sample correction. The Chao1
computation is here:

https://github.com/vegandevs/vegan/blob/master/R/estimateR.default.R#L44

but we can certainly add it if this is useful? (And I suppose it is if
EstimateS has included it by default now...)

G


 --

 Bob O'Hara

 Biodiversity and Climate Research Centre
 Senckenberganlage 25
 D-60325 Frankfurt am Main,
 Germany

 Tel: +49 69 7542 1863
 Mobile: +49 1515 888 5440
 WWW:   http://www.bik-f.de/root/index.php?page_id=219
 Blog: http://blogs.nature.com/boboh
 Journal of Negative Results - EEB: www.jnr-eeb.org


 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] EstimateR and Chao 1 standard deviation

2014-10-17 Thread Gavin Simpson
On 17 October 2014 10:04, Bob O'Hara boh...@senckenberg.de wrote:

  On 10/17/2014 05:36 PM, Gavin Simpson wrote:

  On 17 October 2014 06:44, Bob O'Hara boh...@senckenberg.de wrote:

 On 17/10/14 13:54, José M. Blanco Moreno wrote:

 Dear users

 I have been checking the values returned by the function estimateR
 against the formulae in the appendix to nonparametric estimators of
 species richness in EstimateS
 (
 http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB)...

 and they do not match each other.

 Where does this line of code in vegan:::estimateR.default come from?

 sd.Chao1 - sqrt(a[2] * ((G^4)/4 + G^3 + (G^2)/2))

 It is from any other reference that I should have under control?

  Unless Jari added something, the functions come from my paper from
 about 10 years ago:
 O’Hara RB (2005) Species richness estimators: how many species can dance
 on the head of a pin? J Anim Ecol 74: 375–386.
 http://doi.wiley.com/10./j.1365-2656.2005.00940.x, which refers to
 Chao (1987), which is cited in the estimateR documentation. EstimateS now
 uses a small sample correction: 
 http://viceroy.eeb.uconn.edu/EstimateS/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#Chao1AndChao2
 
 Does this account for the discrepancy?

 Bob


  Yep, vegan does not include the small sample correction. The Chao1
 computation is here:

  https://github.com/vegandevs/vegan/blob/master/R/estimateR.default.R#L44

  but we can certainly add it if this is useful? (And I suppose it is if
 EstimateS has included it by default now...)

Might as well. it looks like it shouldn't make a huge difference
 unless sample sizes are small.

 Bob


Okay, I'll see about getting this into vegan 2.2-0 which is about ready for
release in the next couple of weeks.

Cheers

G


   G


 --

 Bob O'Hara

 Biodiversity and Climate Research Centre
 Senckenberganlage 25
 D-60325 Frankfurt am Main,
 Germany

 Tel: +49 69 7542 1863
 Mobile: +49 1515 888 5440
 WWW:   http://www.bik-f.de/root/index.php?page_id=219
 Blog: http://blogs.nature.com/boboh
 Journal of Negative Results - EEB: www.jnr-eeb.org


 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




  --
 Gavin Simpson, PhD



 --
 Bob O'Hara

 Biodiversity and Climate Research Centre
 Senckenberganlage 25
 D-60325 Frankfurt am Main,
 Germany

 Tel: +49 69 7542 1863
 Mobile: +49 1515 888 5440
 WWW:   http://www.bik-f.de/root/index.php?page_id=219
 Blog: http://blogs.nature.com/boboh
 Journal of Negative Results - EEB: www.jnr-eeb.org




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] SIMPER problem: invalid 'nrow' value (too large or NA)

2014-10-15 Thread Gavin Simpson
On 15 October 2014 07:24, Jari Oksanen jari.oksa...@oulu.fi wrote:


 On 14/10/2014, at 21:41 PM, mastratton wrote:

  markusvlindh wrote
  Dear all,
 
  I'm having difficulty applying a SIMPER analysis found in vegan,
 following
  the example provided i the help function of simper. I keep receiving the
  following error message:
 
  Error in matrix(ncol = P, nrow = n.a * n.b) :
   invalid 'nrow' value (too large or NA)
 
  My data consist of a community matrix with 200 species and 43 dates
 (class
  = data.frame) and my groups consists of factors with in total 12
 levels.
 
  A mock example could be the following that is working! :
  library(vegan)
  community-data.frame(replicate(43,sample(0:1000,200,rep=TRUE)))
 
 groups-as.factor(replicate(1,sample(c(Alpha,Beta,Gamma,Epsilon,Bact,Actino,Verr,Unclass,Cyano,Plancto,Eury,Chloro),200,rep=T))
  simper_test-simper(community,groups)
  summary(simper_test)
 
  But please see the attached files for true data that is not working.
 
  Could someone please please assist in what is the problem with my data.
 
  Kind regards!
 
  Markus,
 
  I was getting the same error message and discovered that simper() is not
  written to handle an input 'group' factor that has one or more unique
 values
  with only one occurrence. Your data (reattached) have two of these
  instances:.
 
 ...
  The source code for simper() can be modified to allow these instances:
 
 Yes, the source can be modified *and* it has been modified to cope with
 one-member groups. You can install the modified version of vegan for
 Windows, or if you have programming tools for other OS's, too, using:

 install.packages(vegan, repos=http://R-Forge.R-project.org;)

 This is development version, but we are going to release new CRAN version
 of vegan later this month, and the R-Forge version is very close to the
 release version. (Looks like the build system is down again in R-Forge, and
 has been for a week, so that this is not the most recent and stable version
 of R-Forge, but it is mostly safe.)

 Cheers, Jari Oksanen


and if you want something really up-to-date, then we do have binary builds
on Windows courtesy of the Appveyor continuous integration service.
https://ci.appveyor.com/project/gavinsimpson/vegan and click on the
artefacts tab.

G


 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Help with a function

2014-06-06 Thread Gavin Simpson
I don't think it is good general advice to suggest that people use
`subset()` in a function. You have to be every so clever to make this work
right once you call `foo()` inside another function because of where
`eval(cond)` is doing its evaluation of the `condition` won't be the same
as when you call `foo()` at the top level. There's a reason for the Warning
section in `?subset`.

G


On 6 June 2014 13:48, Sargeant, Glen gsarge...@usgs.gov wrote:

 As you are trying to learning to write functions, I'll submit a very
 general solution
 that illustrates a couple of core skills.  It returns a random sample of
 rows for
 any dataframe and any condition, and accepts optional arguments to
 sample.

 #Example data
 Idf1 - c(12,14,15,16,17,18,19,21,25,24,26,28,29,32,33,35,36,37,48)
 casod - c(1,1,1,1,3,3,3,3,1,1,1,1,3,3,3,1,3,1,3)
 mydatabase - data.frame(Idf1,casod)

 #Required arguments are `df', a dataframe, and `condition',
 #a character string.  The function also accepts  optional
 #arguments to `sample'

 foo - function(df,condition,...){
   cond - parse(text=condition)
   df. - subset(df,eval(cond))
   idx - sample(1:nrow(df.),...)
   df.[idx,])
 }

 foo(df=mydatabase,condition=casod!=1,size=5,replace=FALSE)
 foo(df=mydatabase,condition=casod!=1,size=10,replace=TRUE)








 On Fri, Jun 6, 2014 at 8:39 AM, Rodrigues rodrigues...@gmail.com wrote:

  Dear R users,
 
 
  I’m trying to build a function to select random samples idf’s from a
  database.
  So, my data frame had 2 columns and 575 rows. Follow bellow an example of
  my
  database
  Idf1casod
  12  1
  14  1
  15  1
  16  1
  17  3
  18  3
  19  3
  21  3
  25  1
  24  1
  26  1
  28  1
  29  3
  32  3
  33  3
  35  1
  36  3
  37  1
  48  3
 
  So my function is
 
  blinding=function(sample){
sort=sample(idf1,10,replace=F)
return(sort2)
  }
 
  It is pretty simple and I would like to add one more step in my choice. I
  would like to link my choice to casod stats. Thus if casod==3 sample
 would
  be random idfs could not be an idf with casod=1. Does someone can help
 me?
 
 
 
 
  --
  View this message in context:
 
 http://r-sig-ecology.471788.n2.nabble.com/Help-with-a-function-tp7578931.html
  Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 



 --
 Glen Sargeant, Ph.D.
 Research Wildlife Biologist/Statistician
 USGS Northern Prairie Wildlife Research Center
 E-mail: gsarge...@usgs.gov
 Phone: (701) 253-5528

 [[alternative HTML version deleted]]


 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 

Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] calculating standard error of coefficients from adonis model

2014-05-09 Thread Gavin Simpson
Rafter,

The permutation test goes nowhere near the coefficients. IIRC it works on
the sum of squares decomposition directly.

Jari has already passed on more info about the bootstrapping idea.
Basically the boot package has functions that will apply your function to
bootstrap samples of the original data. Your function, which you need to
write will be a wrapper to adonis() to fit the model to the bootstrap
sample and extract the coefficients you want. boot takes this output and
computes the relevant bootstrap statistics, one of which can be the
standard error.

You'll need to study the boot package to work out how best to write your
wrapper. As the coefficients in the adonis() object are a matrix, not a
single value, you may need to think through how best to fit this into what
boot needs. For example you might extract a single row from the coefficient
matrix thus focussing on a selected variable for all species, and do
separate boot() calls for each of the variables. But I'm thinking about
this from memory so don't take this as gospel; do your homework on boot
well.

HTH

G


On 9 May 2014 11:29, Rafter liberationecol...@gmail.com wrote:

 Hi Jari,

 Thanks so much for your clarification and expansion of Gavin's comments.

 I was confused by the application of bootstrapping to a function that is
 already
 using a permutation procedure, but I see now that bootstrapping the
 regression
 coefficients would operate independently of adonis' permutation tests.

 I'll try and apply better standards for clarity in the future before
 deciding I
 don't understand something. ;)

 Warmly,
 Rafter




 --
 View this message in context:
 http://r-sig-ecology.471788.n2.nabble.com/calculating-standard-error-of-coefficients-from-adonis-model-tp7578878p7578887.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 

Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER

2014-03-28 Thread Gavin Simpson
Hi Steve,

I agree with your points here; I simply wanted to avoid the impression
that `betadisper()` did anything with the centroids. It did seem like
the OP and some others had got this impression.

I also agree that the PCoA way of computing the centroids is a useful
tool not just for `betadisper()`; there is no reason that this be
restricted to running a `betadisper()` just to get that information.
I'll see about removing this functionality from being embedded only
`betadisper()` and abstract it out to a user-visible function that
`betadisper()` can use internally.

G

On 27 March 2014 12:28, Steve Brewer jbre...@olemiss.edu wrote:
 Gavin and Brandon,

 Yes, I am aware that betadisper() does not actually give you a test of
 differences between centroids, but the fact that it does calculate
 centroids is quite valuable for interpretation, in my opinion, especially
 when using non-euclidean distance matrices (e.g., Bray-Curtis) and also if
 you would prefer NOT to do additional pairwise tests between levels, but
 still would like to have some idea as to which pairwise differences
 between levels might be most responsible for the effect. When using
 bray-curtis distances, you can't get centroids by calculating averages of
 abundances among the observations of interest. If you just want to use a
 NMDS ordination with levels symbol-coded to make them distinct, that's
 fine. Sometimes folks calculate the average axis score per group or level
 of group and plot that. That's fine, too. The nice thing about obtaining
 centroids calculated using betadisper() is that they are based on a
 principal coordinates analysis that uses ALL the axes, not just the first
 two or three axes in the ordination. It is likely that if the first two or
 three axes of the NMDS explain most of the important variation, the
 average scores per level for those three axes will probably tell the same
 information as the centroids will.

 Even though it wasn't intended for this purpose, Sharon Graham and I,
 together, figured out that you could use the centroids calculated by
 betadisper() to analyze split-plot and repeated-measures designs using
 adonis. So, its value extends beyond what it was intended for.


 Steve

 J. Stephen Brewer
 Professor
 Department of Biology
 PO Box 1848
  University of Mississippi
 University, Mississippi 38677-1848
  Brewer web page - http://home.olemiss.edu/~jbrewer/
 FAX - 662-915-5144
 Phone - 662-915-1077




 On 3/27/14 10:47 AM, Gavin Simpson ucfa...@gmail.com wrote:

Note that `betadisper()` only considers statistically dispersions
about the group centroids. It might show the centroids and return
their values, but it doesn't consider differences in those centroids.
As far is `betadisper()` is concerned, the group centroids could all
be made exactly equal and it wouldn't change the results as it is only
the spread about the centroid that is used.

HTH

G

On 27 March 2014 06:47, Brandon Gerig bge...@nd.edu wrote:
 Hi Steve,

 Yes, this is precisely what I am interested in doing. It seems like
 betadisper might be a good way to visualize differences/similarities in
the
 dispersion and examine differences among centroids for the levels
within a
 factor. Am I correct in thinking that if I conduct additional PERMANOVA
 tests on a reduced data set, I could be evaluating differences between
the
 levels of a main effect?

 Could anyone provide a citation for a paper that uses a similar
procedure?


 On Wed, Mar 26, 2014 at 3:21 PM, Steve Brewer jbre...@olemiss.edu
wrote:

 Brandon,

 Are you asking if you can use betadisper as a substitute for post-anova
 pairwise comparisons among levels? After using betadisper to obtain
 dispersions, I believe you can plot the centroids for each level. In
 addition to telling you if the dispersions differ among levels, you
could
 see how the centroids differ from one another. Is this what you want to
 know? If so, realize that it won't give you pairwise significance tests
 for differences between levels. For that, you might want to do
additional
 permanovas on reduced datasets containing only the two levels you want
to
 compare. You could then adjust the p-values for multiple tests after
the
 fact.

 Hope this helps,

 Steve


 J. Stephen Brewer
 Professor
 Department of Biology
 PO Box 1848
  University of Mississippi
 University, Mississippi 38677-1848
  Brewer web page - http://home.olemiss.edu/~jbrewer/
 FAX - 662-915-5144
 Phone - 662-915-1077




 On 3/26/14 10:57 AM, Brandon Gerig bge...@nd.edu wrote:

 Thanks for the words of caution on simper.
 
 Am I completely off base in thinking that betadiver function
(analgous to
 Levene's test) could be used to examine variation between levels
within
 main effects?
 
 Cheers
 
 
 On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote:
 
  I am assessing the level of similarity between PCB congener
profiles in
  spawning salmon and resident stream in stream reaches with and
without
  salmon to determine

Re: [R-sig-eco] cross validation in CoCA and CCA

2014-03-28 Thread Gavin Simpson
In short, no. I haven't ported the rough code for LOO CV of CCA or
CCA-PLS models. I think I ported the mean centring and crossval
functions from the Matlab sources, but not the code in the
`example_crossvalCCA.m` file from the supplementary materials on the
CoCA paper in Ecology.

I could take a look and see how easy i will be to add this, but it
doesn't sit well with cocorresp or vegan as the former was designed
really for CoCA and the latter doesn't have the other functionality
needed (which exists in cocorresp) and we've not really implemented CV
for ordination methods.

That said, this is R and it is relatively trivial to write your own
LOO or k-fold CV loop, and you can predict from a CCA model using the
`predict()` method for cca objects available in vegan.

Part of the reason, at least as far as I see things, for not having CV
in the common ordination software (closed or open source) is that
these methods tend not to be seen as purely predictive models, which
is what CV is designed to evaluate.

Don't hold your breath for me getting this in cocorresp, but if you
want to follow up I might be persuaded to take a look and see if what
is already in cocorresp will enable you to follow the code in the
`example_crosvalCCA.m` file to write your own LOO code.

HTH

G

On 28 March 2014 14:57, Jesse Becker jcbecke...@gmail.com wrote:
 Hello list,
 I am doing a concordance study between riverine environmental conditions,
 invertebrate, and fish assemblages.  I am doing a predictive CoCA as part
 of the analysis with the cocorresp package.  My question is whether there
 is an implementation of the cross-validation procedure in the cocorresp
 package that would work on the results of a CCA or RDA, without having to
 use MATLAB (which I don't have access to)?  My understanding is that by
 doing the cross validation on the CCA (and hopefully RDA, although I've
 never seen it done) it allows for a more consistent evaluation of
 differences between the two methods.  I haven't seen this as a function in
 vegan.

 Jari?  Gavin?

 Thanks,
 Jesse


 Jesse C. Becker, Ph.D.
 765.285.8889765.285.8889 office
 512.587.4428512.587.4428 cell
 jcbec...@bsu.edu
 jcbecke...@gmail.com

 Call
 Send SMS
 Add to Skype
 You'll need Skype CreditFree via SkypeI am

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Gavin Simpson
1) Visually - unless it actually matters exactly on which day in the
year the peak is observed? If visually is OK, just do `plot(mod, pages
= 1)` to see the fitted splines on a single page. See `?plot.gam` for
more details on the plot method.

2) You could generate some new data to predict upon as follows:

newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

Then predict for those new data but collect the individual
contributions of the spline terms to the predicted value rather than
just the final prediction

pred - predict(mod, newdata = newdat, type = terms)

Then find the maximal value of the DoY contribution

take - which(pred$DoY == max(pred$DoY))
newdat[take, , drop = FALSE]

You could use

take - which.max(pred$DoY)

instead of the `which()` I used, but only if there is a single maximal value.

This works because the spline terms in the additive model are just
that; additive. Hence because you haven't let the DoY and time splines
interact (in the simple model I mentioned, it is more complex if you
allow these to interact as you then need to predict DoY for each years
worth of time points), you can separate DoY from the other terms.

None of the above code has been tested, but was written off top of my
head, but should work or at least get you pretty close to something
that works.

HTH

G

On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
 Thanks Gavin,
  This seems like a promising approach and a first pass suggests it works
 with this data. I can't quite figure out how I would go about interrogating
 the fitted spline to deterine when the peak value happens with respect to
 DoY.  Any suggestions?
 -Jacob


 On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com wrote:

 I would probably attack this using a GAM modified to model the
 residuals as a stochastic time series process.

 For example

 require(mgcv)
 mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
  correlation = corCAR1(form = ~ time))

 where `foo` is your data frame, `DoY` is a variable in the data frame
 computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
 a variable for the passage of time - you could do `as.numeric(RDate)`
 but the number of days is probably large as we might encounter more
 problems fitting the model. Instead you might do `as.numeric(RDate) /
 1000` say to produce values on a more manageable scale. The `bs =
 cc` bit specifies a cyclic spline applicable to data measured
 throughout a year. You may want to fix the start and end knots to be
 days 1 and days 366 respectively, say via `knots = list(DoY =
 c(0,366))` as an argument to `gam()` [I think I have this right,
 specifying the boundary knots, but let me know if you get an error
 about the number of knots]. The residuals are said to follow a
 continuois time AR(1), the irregular-spaced counter part to the AR(1),
 plus random noise.

 There may be identifiability issues as the `s(time)` and `corCAR1()`
 compete to explain the fine-scale variation. If you hit such a case,
 you can make an educated guess as to the wiggliness (degrees of
 freedom) for the smooth terms based on a plot of the data and fix the
 splines at those values via argument `k = x` and `fx = TRUE`, where
 `x` in `k = x` is some integer value. Both these go in as arguments to
 the `s()` functions. If the trend is not very non-linear you can use a
 low value 1-3 here for x and for the DoY term say 3-4 might be
 applicable.

 There are other ways to approach this problem of identifiability, but
 that would require more time/space here, which I can go into via a
 follow-up if needed.

 You can interrogate the fitted splines to see when the peak value of
 the `DoY` term is in the year.

 You can also allow the seasonal signal to vary in time with the trend
 by allowing the splines to interact in a 2d-tensor product spline.
 Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()`
 terms (or using `ti()` terms for the two marginal splines and the
 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
 `te()` term where `x` and `y` are the dfs for each dimension in the
 `te()` term. It is a bit more complex to do this for `ti()` terms.

 Part of the reason to prefer a spline for DoY for the seasonal term is
 that one might not expect the seasonal cycle to be a symmetric cycle
 as a cos/sin terms would imply.

 A recent ecological paper describing a similar approach (though using
 different package in R) is that of Claire Ferguson and colleagues in J
 Applied Ecology (2008) http://doi.org/10./j.1365-2664.2007.01428.x
 (freely available).

 HTH

 G

 On 25 March 2014 19:14, Jacob Cram cramj...@gmail.com wrote:
  Hello all,
   I am thinking about applying season::cosinor() analysis to some
  irregularely spaced time series data. The data are unevenly spaced, so
  usual time series methods, as well as the nscosinor() function are out.
  My
  data do however trend over time

Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER

2014-03-26 Thread Gavin Simpson
You mean `betadisper()`? This simply computes a multivariate
dispersion about the kth group centroid for k groups. If you can
express the levels within main effects as a factor variable defining
the groups then `betadisper()` could work with that, but I'm not quite
following what you want to do.

`adonis()` will test whether the groups means (defined by the
combinations of the levels of the covariate factors) differ.
`betadisper()` can test if there are different variances for the
same groups. If there are different variances, one might question the
results from `adonis()` if it indicated that the observed group means
was inconsistent with the hypothesis of equal group means. This
inconsistency may be due solely or in part to the heterogeneity of
dispersions (variances).

Is that what you want to test/investigate?

G

On 26 March 2014 09:57, Brandon Gerig bge...@nd.edu wrote:
 Thanks for the words of caution on simper.

 Am I completely off base in thinking that betadiver function (analgous to
 Levene's test) could be used to examine variation between levels within
 main effects?

 Cheers


 On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote:

 I am assessing the level of similarity between PCB congener profiles in
 spawning salmon and resident stream in stream reaches with and without
 salmon to determine if salmon are a significant vector for PCBs in
 tributary foodwebs of the Great Lakes.

 My data set is arranged in a matrix where the columns represent the
 congener of interest and the rows represent either a salmon (migratory) or
 resident fish (non migratory) from different sites.  You can think of this
 in a manner analogous to columns representing species composition and rows
 representing site.

 Currently, I am using the function Adonis to test for dissimilarity
 between fish species, stream reaches (with and without salmon) and lake
 basin (Superior, Huron, Michigan).
 The model statement is:

 m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutations=999)

 The output indicates significant main effects of FISH, REACH, and BASIN
 and significant interactions between FISH and BASIN, and BASIN and REACH.

 Is it best to then interpret this output via an NMDS ordination plot or
 use something like the betadiver function to examine variances between main
 effect levels or both?

 Also,  can anyone recommend a procedure to identify the congeners that
 contribute most to the dissimilarity between fish, reaches, and basins?. I
 was thinking the SIMPER procedure but am not yet sold.

 Any advice is appreciated!
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame




 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Gavin Simpson
Sorry about the errors (typos, not syntax errors) - I was forgetting
that you'd need to use `gamm()` and hence access the `$gam` component

I don't follow the point about a factor trending up or down. You
shouldn't try to use the `$lme` part of the model for this.
`summary(mod$gam)` should be sufficient, but as it relates to a
spline, this is more a test of whether the spline is different from a
horizontal, flat, null line. The problem with splines is that the
trend need not just be trending up or down. In the past, to convey
where change in the trend occurs I have used the first derivative of
the fitted spline and looked for where in time the 95% confidence
interval on the first derivative of the spline doesn't include zero;
that shows the regions in time where the trend is significantly
increasing or decreasing. I cover how to do this in a blog post I
wrote:

http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/

the post contains links to the R code used for the derivatives etc,
though it is a little more complex in the case of a model with a trend
spline and seasonal spline.

I'm supposed to have updated those codes and the post because several
people have asked me how I do the analysis for models with multiple
spline terms. If you can't get the code to work for your models, ping
me back and I'll try to move that to the top of my TO DO list.

Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer
to the basis functions that represent the spline or at least to some
part of those basis functions. You can't really make much practical
use out of those values are they relate specifically to way the
penalised regression spline model has been converted into an
equivalent linear mixed effect form.

HTH

G

On 26 March 2014 12:10, Jacob Cram cramj...@gmail.com wrote:
 Thanks again Gavin, this works.
 gamm() also models the long term trend with a spline s(Time), which is
 great. I would still like though, to be able to say whether the factor is
 trending up or down over time.  Would it be fair to query
 summary(mod$lme)$tTable
 and to look at the p-value and Value corresponding Xs(time)Fx1 value to
 identify such a trend?

 Also, here are a few syntax corrections on the code provided in the last
 email:
 1)visual appoach
 plot(mod$gam, pages = 1)

 2) quantitative approach
 pred - predict(mod$gam, newdata = newdat, type = terms)
 take - which(pred[,s(DoY)] == max(pred[,s(DoY)]))
 or
 take - as.numeric(which.max(pred[,s(DoY)]))

 Cheers,
 -Jacob


 On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote:

 1) Visually - unless it actually matters exactly on which day in the
 year the peak is observed? If visually is OK, just do `plot(mod, pages
 = 1)` to see the fitted splines on a single page. See `?plot.gam` for
 more details on the plot method.

 2) You could generate some new data to predict upon as follows:

 newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

 Then predict for those new data but collect the individual
 contributions of the spline terms to the predicted value rather than
 just the final prediction

 pred - predict(mod, newdata = newdat, type = terms)

 Then find the maximal value of the DoY contribution

 take - which(pred$DoY == max(pred$DoY))
 newdat[take, , drop = FALSE]

 You could use

 take - which.max(pred$DoY)

 instead of the `which()` I used, but only if there is a single maximal
 value.

 This works because the spline terms in the additive model are just
 that; additive. Hence because you haven't let the DoY and time splines
 interact (in the simple model I mentioned, it is more complex if you
 allow these to interact as you then need to predict DoY for each years
 worth of time points), you can separate DoY from the other terms.

 None of the above code has been tested, but was written off top of my
 head, but should work or at least get you pretty close to something
 that works.

 HTH

 G

 On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
  Thanks Gavin,
   This seems like a promising approach and a first pass suggests it
  works
  with this data. I can't quite figure out how I would go about
  interrogating
  the fitted spline to deterine when the peak value happens with respect
  to
  DoY.  Any suggestions?
  -Jacob
 
 
  On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com
  wrote:
 
  I would probably attack this using a GAM modified to model the
  residuals as a stochastic time series process.
 
  For example
 
  require(mgcv)
  mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
   correlation = corCAR1(form = ~ time))
 
  where `foo` is your data frame, `DoY` is a variable in the data frame
  computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
  a variable for the passage of time - you could do `as.numeric(RDate)`
  but the number of days is probably large as we might encounter

Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-25 Thread Gavin Simpson
-18,-3.04419201802053
 2003-10-22,-3.13805060873929
 2004-02-19,-3.80688269144794
 2004-03-17,-4.50755507726145
 2004-04-22,-4.38846502542992
 2004-05-19,-3.06618649442674
 2004-06-17,-5.20518774876304
 2004-07-14,-3.75041853151097
 2004-08-25,-3.67882486716196
 2004-09-22,-5.22205827512234
 2004-10-14,-3.99297508670535
 2004-11-17,-4.68793287601157
 2004-12-15,-4.31712380781011
 2005-02-16,-4.30893550479904
 2005-03-16,-4.05781773988454
 2005-05-11,-3.94746237402035
 2005-07-19,-4.91195185391358
 2005-08-17,-4.93590576323119
 2005-09-15,-4.85820800095518
 2005-10-20,-5.22956391101343
 2005-12-13,-5.12244047315448
 2006-01-18,-3.04854660925046
 2006-02-22,-6.77145858348375
 2006-03-29,-4.33151493849021
 2006-04-19,-3.36152357710535
 2006-06-20,-3.09071584142593
 2006-07-25,-3.31430484483825
 2006-08-24,-3.09974933041469
 2006-09-13,-3.33288992218458
 2007-12-17,-4.19942661980677
 2008-03-19,-3.86146499633625
 2008-04-22,-3.36161599919095
 2008-05-14,-4.30878307213324
 2008-06-18,-3.74372448768828
 2008-07-09,-4.65951429661651
 2008-08-20,-5.35984647704619
 2008-09-22,-4.78481898261137
 2008-10-20,-3.58588161980965
 2008-11-20,-3.10625125552057
 2009-02-18,-6.90675477864855
 2009-03-11,-3.43446932013368
 2009-04-23,-3.82688066341466
 2009-05-13,-4.44885332005661
 2009-06-18,-3.97671552612412
 2009-07-09,-3.40185954003936
 2009-08-19,-3.44958231694091
 2009-09-24,-3.86508161094726
 2010-01-28,-4.95281587967569
 2010-02-11,-3.78064756876257
 2010-03-24,-3.5823501176064
 2010-04-27,-4.3363571587
 2010-05-17,-3.90545735473055
 2010-07-21,-3.3147176517321
 2010-08-11,-4.53218360860017
 2010-10-21,-6.90675477864855
 2010-11-23,-6.90675477864855
 2010-12-16,-6.75158176094352
 2011-01-11,-6.90675477864855

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] question about plot

2014-03-22 Thread Gavin Simpson
These lists only allow through a few types of attachments - yours
seemingly not among them. If this is just an image, consider using a
service like imgur http://imgur.com/ to supply the image and give the
link in the email.

G

On 21 March 2014 15:53, Luis Fernando García luysgar...@gmail.com wrote:
 Dear R friends,

 I have to produce a plot like the one attached on the file. The idea is to
 plot the time spent for a spider over several different prey, the spiders
 were repreated in the different trials. If any of you knows how to perform
 this plot or have any source which explains how to do it, I would really
 appreciate it.

 Thanks in advance.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Fwd: How to label arrows in Ordiplot 3d graphs

2014-03-05 Thread Gavin Simpson
[You do yourself no favours by spamming this list with multiple emails
in the same thread]

See `?orgltext` - you can ask for the biplot scores which is what we
label the arrows with on a constrained ordination.

HTH

G

On 5 March 2014 07:38, Rajendra Mohan panda rmp.iit@gmail.com wrote:
 Dear members

 I am doing multivariate analysis of my data using cca and rda using r
 (vegan package). However I am not able to label arrows in 3d ordination
 diagram. kindly advise.

 With best Regards
 Rajendra M Panda
 SWR, IIT KGP

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Fwd: Error message while running anova in rda in r using vegan

2014-03-05 Thread Gavin Simpson
We see this from time to time and it usually means there s something
degenerate with a fit, but we haven't been able to get to the bottom
of it. Without reproducible examples I don't have the time to track
this down further.

HTH

G

On 5 March 2014 07:37, Rajendra Mohan panda rmp.iit@gmail.com wrote:
 Dear members

 I find continuous error message while running anova by axis in rda
 analysis using vegan in r as follows:
 Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

 The code I used is anova (model, by = axis)
 Similar code used for anova (model, by = ma) works with no error

 I have gone through previous mails of this forum but I am confused what
 actually I should do with this error. Your advice is highly acknowledged.


 With best Regards
 Rajendra M Panda
 School of Water Resources,
 Indian Institute of Technology Kharagpur

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] plotting models with confidence bands from glmer

2014-02-27 Thread Gavin Simpson
As with a GLM, I wouldn't expect there to be a difference between
prediction and confidence intervals for a GLMM. Also, and related, the
`interval` argument is particular to the `lm` method of `predict`.

Ben Bolker has an example for a particular lme4 model on his GLMM wiki
FAQ. Go to

http://glmm.wikidot.com/faq

and find the section

Predictions and/or confidence (or prediction) intervals on predictions

Another way that spring to mind would be to simulate from the
posterior distribution of the model parameters, but again, this would
require doing it by hand as to the best of my knowledge there is no
function in lem4 for this.

HTH

Gavin

On 27 February 2014 14:17, Cade, Brian ca...@usgs.gov wrote:
 Travis:  I wonder if you can modify the example from predict.lm to do
 something comparable (saw this posting recently) with mixed effects models
 from glmer().

 ?predict.lm

 Offers this example, which seems to meet the request

 x - rnorm(15)
 y - x + rnorm(15)
 predict(lm(y ~ x))
 new - data.frame(x = seq(-3, 3, 0.5))
 predict(lm(y ~ x), new, se.fit = TRUE)
 pred.w.plim - predict(lm(y ~ x), new, interval = prediction)
 pred.w.clim - predict(lm(y ~ x), new, interval = confidence)
 matplot(new$x, cbind(pred.w.clim, pred.w.plim[,-1]),
 lty = c(1,2,2,3,3), type = l, ylab = predicted y)


 Brian

 Brian S. Cade, PhD

 U. S. Geological Survey
 Fort Collins Science Center
 2150 Centre Ave., Bldg. C
 Fort Collins, CO  80526-8818

 email:  ca...@usgs.gov brian_c...@usgs.gov
 tel:  970 226-9326



 On Thu, Feb 27, 2014 at 12:48 PM, Travis Belote travis_bel...@tws.orgwrote:

 Hi all,

 I'm wondering if someone can help me figure out how to produce plots of
 model fits that include 95% CI bands for a generalized linear mixed model.
 I'm using glmer in lme4 to run essentially an ANCOVA to investigate a
 three-way interaction between one categorical variable (3 different
 species) and 2 continuous variables to investigate survival probability (0
 or 1) of trees.

 I've found a 3-way interaction between these variables. I have produced a
 stacked graph showing how the survival probabilities for different species
 (3 different lines) vary across a gradient of one of the variable and at 2
 levels of one of the other variables (shown by 2 panels). I used the
 parameter estimates to produce the predicted models as line graphs, but I'd
 like to add a confidence band around the models. I've been looking in Zuur
 et al's Mixed effects models and extensions in ecology in R, but wonder
 if someone has a trick to doing this.

 Thanks for any insights!
 Travis


 Travis Belote, Ph.D.
 Research Ecologist
 The Wilderness Society | Northern Rockies Regional Office
 503 W. Mendenhall, Bozeman, MT 59715
 office: 406.586.1600 x110 | cell: 406.581.3808

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] How to accommodate data with negative values for Canonical Correspondence Analysis in R using vegan Package

2014-02-26 Thread Gavin Simpson
I'm still struggling to see why we'd expect temperature to be well
modelled via a species packaging model???

G

On 26 February 2014 16:21, Chris Howden ch...@trickysolutions.com.au wrote:
 Whenever I hit this problem I add the minimum to all scores so they
 now range from 0 and above. If necessary I can back transform.

 As I haven't changed the ratio the parameters coefficients have the
 same meaning, so interpretation is no problem. (Although the intercept
 may now change and be difficult to interpret)

 This will work if the scale is a human construct where negative and
 positive numbers convey the same info, however if they mean different
 things it may not work. For example some indices have different
 meanings if negative.

 Chris Howden
 Founding Partner
 Tricky Solutions
 Tricky Solutions 4 Tricky Problems
 Evidence Based Strategic Development, IP Commercialisation and
 Innovation, Data Analysis, Modelling and Training

 (mobile) 0410 689 945
 (fax / office)
 ch...@trickysolutions.com.au

 Disclaimer: The information in this email and any attachments to it are
 confidential and may contain legally privileged information. If you are not
 the named or intended recipient, please delete this communication and
 contact us immediately. Please note you are not authorised to copy,
 use or disclose this communication or any attachments without our
 consent. Although this email has been checked by anti-virus software,
 there is a risk that email messages may be corrupted or infected by
 viruses or other
 interferences. No responsibility is accepted for such interference. Unless
 expressly stated, the views of the writer are not those of the
 company. Tricky Solutions always does our best to provide accurate
 forecasts and analyses based on the data supplied, however it is
 possible that some important predictors were not included in the data
 sent to us. Information provided by us should not be solely relied
 upon when making decisions and clients should use their own judgement.

 On 27 Feb 2014, at 1:39, Jari Oksanen jari.oksa...@oulu.fi wrote:

 It depends *where* to include negative values. Negative values are OK as 
 constraints (environmental variables, right hand side of the model formula). 
 However, all marginal sums of the response data (left hand side of the model 
 formula) must be above zero. It is technically possible to have some 
 negative entries in the response matrix as long as the marginal sums are 
 positive, but you really should not have them. *CA family of methods were 
 originally developed for non-negative data, and having negative entries 
 usually indicates that your data are not at all suitable for the method. If 
 you insist on the analysis, then you really must know what you are doing, 
 but if you really know, you do not need ask in R-sig-ecology.

 I still repeat: you can have negative data as constraints. If that fails, 
 then something else is wrong with your data.

 If you want to use CCA family of methods for negative response data, then 
 RDA with equal scaling of variables is usable. Very commonly negative values 
 also indicate that your response variables were measured in different scales 
 and units, and therefore you must set scale=TRUE in the rda() call. Not 
 knowing the data or any other details, this is a blind watchmaker 
 recommendation.

 Cheers, Jari Oksanen
 
 From: r-sig-ecology-boun...@r-project.org 
 [r-sig-ecology-boun...@r-project.org] on behalf of Ivailo 
 [ubuntero.9...@gmail.com]
 Sent: 26 February 2014 16:23
 To: Rajendra Mohan panda
 Cc: r-sig-ecology@r-project.org
 Subject: Re: [R-sig-eco] How to accommodate data with negative values for 
 Canonical Correspondence Analysis in R using vegan Package

 On Wed, Feb 26, 2014 at 2:43 PM, Rajendra Mohan panda
 rmp.iit@gmail.com wrote:
 ...
 I have temperature data with negative values which I am not able to include
 for my CCA ordination. ...

 Rajendra, I am curious -- why are you not able to include the negative
 values in the CCA ordination?

 --
 The cure for boredom is curiosity. There is no cure for curiosity. --
 Dorothy Parker

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] partial ordination

2014-01-29 Thread Gavin Simpson
That will work, as long as env is a matrix, not a data frame. An alternative is:

ppca - rda(X = spe, Z = env)

G

On 24 January 2014 09:56, Duarte Viana viana.s...@gmail.com wrote:
 Hello all,

 I'm interested in applying an ordination method (e.g. PCA or NMDS) to
 a species matrix. However, I want to remove the potential effect of
 the environment, which I'm not interested in modelling, from the
 ordination (i.e. partialling out the environmental effect).

 I've thought in the following approach (with the vegan package):

 # spe: species matrix
 # env: environment matrix
 partial.pca-rda(spe~1+Condition(env))
 plot(partial.pca)

 Is this a reasonable approach?

 Some feedback would be great.
 Thanks,

 Duarte

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NA error in envfit

2013-12-06 Thread Gavin Simpson
Phillip,

You approach to using factors misses an important consideration; the
class that was observed in the full dataset should not disappear just
because you subsetted the data in some manner. Also, `droplevels()` is
a useful function to call on a factor or data frame if subsetting
produces levels with zero observations and if that information is not
made use on in whatever computations follow next.

G

On 5 December 2013 10:42, Dixon, Philip M [STAT] pdi...@iastate.edu wrote:
 Kendra,

 I wonder if the problem is a factor level with no observations.  One of the 
 frustrating things about factors (class variables) in R is that the list of 
 levels is stored separately from the data.  This can cause all sorts of 
 problems if you create the factor, then subset the data, and the subset is 
 missing one or more levels of the factor.  You are subsetting your data, so 
 this may be the source of the problem.

 My working philosophy is to keep variables as character strings or numbers 
 until just before I need the factors.  That avoids any issues with extraneous 
 levels.  That means reading data sets (.txt or .csv files) with as.is=TRUE to 
 avoid default creation of factors.  relevel() may recreate the list of 
 levels.  I usually use factor(as.character(variable)) to flip a factor to a 
 vector of character strings then back to a factor with the correct set of 
 levels.

 Best wishes,
 Philip Dixon


 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Erro in ordiR2step

2013-11-14 Thread Gavin Simpson
The error says there is a problem with the formula used. Did you use a
formula to fit the model? If so, care to show it to us?

G

On 13 November 2013 14:19, amommendes amommen...@hotmail.com wrote:
 Dear List,

 I'm trying select spatial variables with ordiR2step function {vegan}.
 But I got the following error:

 Error in terms.formula(formula, data = data) :
   invalid model formula in ExtractVars

 Could someone help me with this error? Would be some wrong with structure of
 my data?

 Thanks in advance,

 Best,

 Amom





 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/Erro-in-ordiR2step-tp7578520.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Missing species names in metaMDS object

2013-10-25 Thread Gavin Simpson
 ...
  $ EUPH.HELI: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ EUPH.OXYO: num [1:450] 0 0 0 1 0 1 0 0 0 0 ...
  $ EUPH.PEPL: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ FACT.ASCH: num [1:450] 0 0 0 0 0 0 1 0 0 0 ...
  $ FERU.COMM: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ FILA.CONT: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ FILA.PALA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ FILA.PYRA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GAGE.COMM: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GALI.DIVA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GALI.JUDA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GALI.SETA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GERA.MOLL: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GERA.ROTU: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ GERO.HYBR: num [1:450] 0 1 0 1 1 0 0 0 0 0 ...
  $ HEDY.RHAG: num [1:450] 1 0 0 0 0 0 0 0 0 0 ...
  $ HELI.ROTU: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ HELI.SALI: num [1:450] 0 0 0 0 1 1 0 0 1 0 ...
  $ HIPP.UNIS: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ HIRS.INCA: num [1:450] 0 0 0 1 0 0 0 0 1 1 ...
  $ HORD.BULB: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ HORD.SPON: num [1:450] 0 0 0 1 1 0 0 0 0 0 ...
  $ HYME.CIRC: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ HYPA.HIRT: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ IRIS.PALA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ ISAT.LUSI: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ LAGO.CUMI: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ LATH.APHA: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ LATH.BLEP: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ LATH.HIER: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
  $ LATH.MARM: num [1:450] 0 0 0 0 0 0 0 0 0 0 ...
   [list output truncated]
  - attr(*, row.names)= int [1:450] 1 2 3 4 5 6 7 8 9 10 ...

 I have no idea why are the species names missing, there are no warning. I
 only found questions regarding missing species scores, but that doesn't seem
 to be the problem here.

 Any idea?
 Thank's!
 Hila



 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/Missing-species-names-in-metaMDS-object-tp7578473.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] point color in NMDS (vegan)

2013-08-19 Thread Gavin Simpson
I have some posts on my blog that explain how to do this, ain the vein
of Vit's reply (indexing via a factor). This way is preferable to that
of Jim's brute-force way, but might seem somewhat magical if you
aren't familiar with indexing rules and how factors are handled.

Anyway, here is the link to the blog post with working examples:

http://www.fromthebottomoftheheap.net/2012/04/11/customising-vegans-ordination-plots/

Anjoy,

G

On 18 August 2013 17:02, Elaine Kuo elaine.kuo...@gmail.com wrote:
 Dear List,



 This is Elaine.

 I am using metaMDS in package vegan to plot NMDS.

 However, I want to draw the points using different colors according to
 island sites.

 For example:

 island site 1: island B, C, D = blue

 island site 2: island A, E, F = green

 island site 3: island G, H, J = red



 Please kindly advise how to modify the following code to classify the sites
 by colors.

 Thank you



 Elaine



 library(MASS)

   library(vegan)

   island.NMDS  -  metaMDS(island,k=2, distfun  =  betadiver,  distance  =
 sim,trymax=100,zerodist=add)

   plot(island.NMDS,  type  =  n)



   # species as symbols

   points(island.NMDS, display = 'species', pch = '+', cex = 0.6)



   # sites as text

   text(island.NMDS, display = 'sites')

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] point color in NMDS (vegan)

2013-08-19 Thread Gavin Simpson
Yes, look at ?dune.env for what it contains.

G

On 19 August 2013 16:45, Elaine Kuo elaine.kuo...@gmail.com wrote:
 Hello Gavin,

 Thanks for mentioning your blog.

 I would like what dune.env is, since it is very important to generate
 different colors.
 Is dune.env different from dune?
 What will the dune.env be in the case of metaMDS instead of rda?

 Thank you again

 Elaine


 On Tue, Aug 20, 2013 at 12:08 AM, Gavin Simpson ucfa...@gmail.com wrote:

 I have some posts on my blog that explain how to do this, ain the vein
 of Vit's reply (indexing via a factor). This way is preferable to that
 of Jim's brute-force way, but might seem somewhat magical if you
 aren't familiar with indexing rules and how factors are handled.

 Anyway, here is the link to the blog post with working examples:


 http://www.fromthebottomoftheheap.net/2012/04/11/customising-vegans-ordination-plots/

 Anjoy,

 G

 On 18 August 2013 17:02, Elaine Kuo elaine.kuo...@gmail.com wrote:
  Dear List,
 
 
 
  This is Elaine.
 
  I am using metaMDS in package vegan to plot NMDS.
 
  However, I want to draw the points using different colors according to
  island sites.
 
  For example:
 
  island site 1: island B, C, D = blue
 
  island site 2: island A, E, F = green
 
  island site 3: island G, H, J = red
 
 
 
  Please kindly advise how to modify the following code to classify the
  sites
  by colors.
 
  Thank you
 
 
 
  Elaine
 
 
 
  library(MASS)
 
library(vegan)
 
island.NMDS  -  metaMDS(island,k=2, distfun  =  betadiver,  distance
  =
  sim,trymax=100,zerodist=add)
 
plot(island.NMDS,  type  =  n)
 
 
 
# species as symbols
 
points(island.NMDS, display = 'species', pch = '+', cex = 0.6)
 
 
 
# sites as text
 
text(island.NMDS, display = 'sites')
 
  [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



 --
 Gavin Simpson, PhD





-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Plotting adonis (vegan) results

2013-08-10 Thread Gavin Simpson
Just one point Vit, re ...or choose another ordination technique. If by
that you meant you could use PCA or CA or NMDS etc, then I don't think that
will be useful. We use PCoA because the inputs to adonis(), betadisper, etc
are dissimilarity matrices and the aim is to show those dissimilarities
(and the group centroids/dispersions) as best as possible whilst preserving
the direct link to the distance values (ie why we don't use NMDS). I guess
you could use PCA for this if you have Euclidean distances, CA if you want
Chi-square etc, but not NMDS as it works on ranks so you'd get an odd
representation of what adonis() or betadisper() do.

As for the rest, you seem to have that right, though the cmdscale() call
might take a little bit more effort if you want to handle negative
Eigenvalues that arise with some dissimilarity metrics.


On 10 August 2013 06:33, Vít Syrovátka syro...@sci.muni.cz wrote:

 Dear Jonas,
 I've just had a look into the paper (Friberg et al (2013)), but there is
 no information on what kind of ordination they used (at least it is very
 well hidden). Hard to believe, maybe I am wrong, but can't find anything
 about that. And they used different ordination techniques to visualise the
 groups with regard to the results of 'adonis()' and 'betadisper()' -
 compare Fig.1 and Fig. 2. Maybe that brought the confusion.

 The thing is that 'adonis()' tells you, whether the groups consistently
 differ in their community composition, and then it's up to you to chose an
 ordination technique to visualize the groups. Probably that would be an
 ordination that approximates the same distances used in 'adonis()' (or
 'betadisper()'). The plot method of 'betadisper()' uses Principal
 Coordinate Analysis (PCoA), which you can use to visualize the results of
 'adonis()' as well (as Gavin and Jari pointed out) or do it by your own or
 choose another ordination technique.

 A small example with PCoA:

 ## taken from ?betadisper:
 library(vegan)
 data(varespec)

 # create the dist object
 dis - vegdist(varespec)

 # create grouping variable; First 16 sites grazed, remaining 8 sites
 ungrazed
 groups - factor(c(rep(1,16), rep(2,8)), labels = c(grazed,ungrazed))

 # Calculate multivariate dispersions
 mod - betadisper(dis, groups)

 # Plot PCoA ordination with differentiated groups = visualize the
 dispersions of groups in 2D ordination space
 plot(mod)

 ## now create the same plot on my own (though with different graphic
 parameters)
 # calculate the PCoA ordination sites' scores
 pcoa- cmdscale(dis)
 # plot them with different symbols for groups
 plot(pcoa, asp=1)
 # and add spiders and hulls
 ordispider(pcoa, groups)
 ordihull(pcoa, groups)

 If I am wrong in any point, will be glad if someone corrects me.
 Cheers,
 Vit


 Dne 2013-08-09 19:25, Gavin Simpson napsal:

  What Jari meant was if you use `betadisper()` on your data then it will
 give you an object that has a `plot()` method which will give you
 something
 of relevance to what `adonis()` and the whole PERMANOVA method is doing.
 It
 plots the groups centroids and dispersions of the groups in principal
 coordinates space. I don't have easy access to that journal at the minute,
 but will look when I get back from this conference next week.

 You can see how `betadisper` and its `plot` method works in the mean time
 if you want to customise how things are plotted.

 Gavin

 On 9 August 2013 06:54, jonas.pers...@niva.no wrote:

  Hi all

 I'd like to plot my results from adonis (vegan). Internet searches found
 me an older post here in which Jari Oksanen wrote: adonis() does not
 have
 plot method, but  you can use betadisper for plots. But I'm afraid I
 don't understand how to do it this way.

 How can I a) make a plot of adonis results? b) manipulate the layout of
 adonis/betadispers figures to my liking?
 My main inspiration for both adonis and betadispers results plotting is
 Friberg et al (2013) Hydrol. Process. 27, 734?740, so I'd like to be able
 to produce something similar.

 Cheers,
 Jonas


 __**_
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/**listinfo/r-sig-ecologyhttps://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 

Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Plotting adonis (vegan) results

2013-08-09 Thread Gavin Simpson
What Jari meant was if you use `betadisper()` on your data then it will
give you an object that has a `plot()` method which will give you something
of relevance to what `adonis()` and the whole PERMANOVA method is doing. It
plots the groups centroids and dispersions of the groups in principal
coordinates space. I don't have easy access to that journal at the minute,
but will look when I get back from this conference next week.

You can see how `betadisper` and its `plot` method works in the mean time
if you want to customise how things are plotted.

Gavin

On 9 August 2013 06:54, jonas.pers...@niva.no wrote:

 Hi all

 I'd like to plot my results from adonis (vegan). Internet searches found
 me an older post here in which Jari Oksanen wrote: adonis() does not have
 plot method, but  you can use betadisper for plots. But I'm afraid I
 don't understand how to do it this way.

 How can I a) make a plot of adonis results? b) manipulate the layout of
 adonis/betadispers figures to my liking?
 My main inspiration for both adonis and betadispers results plotting is
 Friberg et al (2013) Hydrol. Process. 27, 734?740, so I'd like to be able
 to produce something similar.

 Cheers,
 Jonas

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




-- 

Gavin Simpson, PhD

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] mgcv:gam predicted class?

2013-07-18 Thread Gavin Simpson
On Thu, 2013-07-18 at 16:13 +0200, Per Bergström wrote:
 How to get prediction of classes in gam?
 I have a problem with getting predicted classes from a gam-model using 
 the gam function in the mgcv-package.
 I have a dataset where I have classified the response variable into 5 
 classes (i. e. 1, 2, 3, 4, 5) and I want to use a gam-model generated on 
 a training dataset (traindata) to predict the class beloning of data 
 from a test-dataset (testdata) but I can't get it to work.
 
 I generated the gam-model using:
 model-gam(growthclass~Vol+Expo+distance+..+temperature,family=binomial, 
 type=classification,data=traindata)

The binomial would presume two classes (0,1) and there is no `type`
argument in `mgcv::gam()`, so the model you are fitting seems ill-fitted
(sorry) to the problem.

What you describe could be a multinomial model, but I don't know how
that can be fitted with splines.

It seems *you* has discretised the response. Why not fit the model on on
the non-discrete response and then apply the classification to those
values rather than the other way round which is what you have done.

HTH

G

 When I then try to predict the class beloning of the test-dataset using:
 pred-predict(model,newdata=testdata)
 I get results like this:
 
  2   4   6   7   9
 -1.93680872  2.28422570  0.99747563  0.03627236  0.74235160
 
 But I really want the results to look like this:
 
  2   4   6   7   9
   1   3   2   1   2
 
 
 i.e. the predicted classes for the new samples.
 
 Obviously I am doing something wrong and it is probably very simple to solve 
 but I've got stuck on it for a while and would appreciate some help solving 
 it.
 
 Thanks
 Per
 
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mixed effect (intercept) model with AR1 autocorrelation structure

2013-07-18 Thread Gavin Simpson

On Thu, 2013-07-18 at 14:34 +, Henkelman, Jonathan wrote:
 Gavin,
 
 Welcome to the prairies...

Cheers

snip /

 I was up examining Zuur last night and found a passing reference to
 this problem on page 160 It may be an option to extend the model with
 the AR1 correlation, with a random intercept on nest [a case which
 would correspond exactly to my scenario]. Such a model allows for the
 compound correlation between all observations from the same nest, and
 temporal correlation between observations from the same nest _and_
 night.  *But the danger is that the random intercept and
 auto-correlation will fight with each other for the same information.*
  [* are mine...] I think this is exactly what is happening: the two
 correlation specifications are fighting with each other and the mixed
 intercept is loosing.

Exactly; I've had the same problem modelling smooth trends with residual
correlations.

 I have had many suggestions about other ways to model this dataset.  I
 have been and will continue to pursue these ideas (I'd also like to
 build a simple Baysian model as Zuur does in chapter 23 to deal with a
 similar problem).  However, I have not had anyone comment directly on
 my two main questions:
 1) is this analysis doable or is nlme perhaps the wrong package?; and 
 2) Is this a reasonable question to ask?  

Sorry I missed one important aspect of approach 2) (from my earlier
email). What I suggested was fit the random effects model without the
autocorrelation structure, then add the appropriate correlation
structure in in a second run after looking at the ACF/PACF to determine
what order ARMA to use. What I should have said is that you will have to
set the parameters of the ARMA model explicitly in `corARMA()` (see
`?corARMA` for details on how to do that.

To actually do this then you'll have to estimate an ARMA for the
normalised residuals from the random effects fit. You'll have to do this
*within* the Plots. That will give many models (`Plot` models) and
you'll need to choose common parameter values from the set of fits -
`corARMA()` fits the same structure within each level of Plots. So the
resulting values you plug into `corARMA()` will be say an average of the
coefficients derived from the individual Plot-level ARMA fits. Note this
won't be perfect, but as Zuur et al note, the idea is not to model the
covariance structure perfectly but to get something that is reasonable.

Beyond that, you're into Bayesian modelling to have fine control, or do
as Brian Cade suggests and put everything in the fixed effects if you
have enough DFs.

Fitting these models (LMEs with correlation structures) can get very
hard; it is nice when they work without having to impart much effort,
but very often the optimisation heads off down the wrong path and the
only way to correct that is to estimate somethings outside lme() and
then plug that back into lme() as known.

 I think question 2) is answered given the citation above -- it is
 valid conceptually.  Perhaps nlme IS the wrong package, or there is a
 bug in nlme since the information _seems_ to be there -- can anyone
 comment on this.

This isn't a bug, it is just a feature (infelicity) of the optimisation
algorithm that is choosing parameters. It is trying to choose values for
the variance of the random effects and the parameters for the ARMA part.
It obviously got a better fit by saying that the residual variance was
better fitted by the AR (or ARMA) than by putting some in the random
effects. Hence by estimating the ARMA coefficients after fitting the
random effect model, and then using these as fixed values in `corARMA()`
might allow the model to be fitted.

HTH

G

   But I do still wonder about suppostion 3: Is it valid to use a
 two-step approach: run a basic autocorrelation model to remove the
 autocorrelation effect (normalized residuals shown in Figs 1,2), then
 run a mixed intercept model on the residuals to deal with the nested
 structure (and also the heterogeneity between plots).  This runs,
 gives reasonable results that agree with the graphs, but is it
 statistically valid?
 
 Thanks,
 Jonathan
 
 From: Gavin Simpson [gavin.simp...@ucl.ac.uk]
 Sent: July 17, 2013 10:06 PM
 To: Henkelman, Jonathan
 Cc: r-sig-ecology@r-project.org; Johnstone, Jill
 Subject: Re: [R-sig-eco] Mixed effect (intercept) model with AR1 
 autocorrelation structure
 
 On Wed, 2013-07-17 at 16:10 +, Henkelman, Jonathan wrote:
  Perhaps I should clarify.  There is a time-series trend -- daily
  temperature fluctuates randomly throughout the summer.  But there is
  not a clear long-term signal.  I have modelled the time-series effect
  using a gam to see if that can adequately compensate for the effect.
  However, I believe this is a fundamentally flawed approach:
 
  1) We are not interested in modelling the time-series; it merely is a
  way of estimating the temperature response in plot.  That is, I don't
  want to ask

Re: [R-sig-eco] Mixed effect (intercept) model with AR1 autocorrelation structure

2013-07-17 Thread Gavin Simpson
... and it is easy to see the values line up with those shown in Figure 1.
 ===
 Plots - as.character(levels(df$plot))
 M - vector(length=12); for (i in 1:12) M[i] - mean (En[df$plot==Plots[i]]); 
 M
 [1]  0.833 -0.436  0.126 -0.849  0.536  0.077  0.877  0.240 -1.674  1.419  
 0.277  -1.146
 S - vector(length=12); for (i in 1:12) S[i] - sd (En[df$plot==Plots[i]]); S
 [1] 1.453  0.704  0.400  0.648  0.594  0.523  0.551  1.014  0.934  0.639  
 0.634  0.794
 ===
 
 Also a mixed effect model run on the residuals shows there is information 
 that can yet be extracted: that is, there is still a stong mixed effect but 
 no fixed effect left in the residuals.
 ===
 df.gls - df; df.gls$avg - resid (mod.gls, type=normalized);
 summary (mod.MEgls - lme(avg~Tmt, random=~1|plot, data=df.gls))
 
 Random effects:
  Formula: ~1 | plot
 (Intercept)  Residual
 StdDev:   0.9317715 0.7889055
 
 Fixed effects: avg ~ Tmt
  Value Std.Error   DFt-value p-value
 (Intercept)  0.1529750 0.3818732 1092  0.4005911  0.6888
 TmtW-0.2589626 0.5400503   10 -0.4795157  0.6419
 ===
 
 3. Perhaps the problem is that I am trying to specify two different types of 
 correlation structure and they just don't agree: the induced correlation 
 structure from the mixed effect on intercept (compound symmetric); whereas I 
 am also specifying an AR1 correlation structure with the correlation 
 argument.  Still, it should be possible to say, remove the AR1 correlation 
 structure, then I assume all the readings within a plot are more correlated 
 to each other than to other plots (as is shown in Figure 1), that is, a 
 compound symmetric structure. This is functionally what has been done in the 
 last code block with the analysis of the residuals.  But is this a valid 
 method?  I feel both structures apply in a theoretical sense; it's just a 
 question of how to specify this in R.
 
 4. Finally, an easy one, can anyone comment on the strong correlation between 
 the intercept and treatment parameters (-0.707)?  Is this a problem, or does 
 it just reflect the nature of the model? It does not go away when I center 
 the temperature data...
 
 Thanks for your thoughts...
 Jonathan Henkelman
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Mixed effect (intercept) model with AR1 autocorrelation structure

2013-07-17 Thread Gavin Simpson
On Wed, 2013-07-17 at 16:10 +, Henkelman, Jonathan wrote:
 Perhaps I should clarify.  There is a time-series trend -- daily
 temperature fluctuates randomly throughout the summer.  But there is
 not a clear long-term signal.  I have modelled the time-series effect
 using a gam to see if that can adequately compensate for the effect.
 However, I believe this is a fundamentally flawed approach:
 
 1) We are not interested in modelling the time-series; it merely is a
 way of estimating the temperature response in plot.  That is, I don't
 want to ask the question, can I predict the temperature given the
 treatment and the date, but rather, is there a treatment effect.  My
 inference ability is badly reduced if a model a gam.

It is *irrelevant* whether you want to model the time series. You *must*
model it either via the fixed effects or in the covariance matrix of the
residuals.

You have two options:

1) include a time effect (either linear or spline) plus possibly a
simple time series model for the residuals (AR(1) would be a start).
Your null would include the time effect thus you can assess the effect
of Treatment relative to this null.

2) proceed as you did but use a much more complex ARMA model for the
residuals. AR(1) is clearly inappropriate with such a high \phi
estimate.

Which is most useful will depend on the data; it sounds as if you don't
have a deterministic trend as you seem to be describing a stochastic
trend. However, I'd be very surprised if you didn't have a cyclic
temperature effect with temperatures varying smoothly through the year
in a seasonal signal/cycle, with noise superimposed. But I am just
guessing.

Perhaps start with the mixed effects and **no** correlation structure if
you don't want to do 1). Extract the normalised residuals and look at
the ACF and the PACF of these. That will help identify what order ARMA
model might be needed. Then add the correlation structure with
corARMA().

I didn't notice you were at USask; I move to the UofR a few months ago.
If it would help to discuss offlist, my UofR contact details are in the
footer.

HTH

G

 2) The current analysis is for a single season.  In a few years we
 will be re-running this analysis of 5 years of data.  I do not expect
 the random fluctuation in seasonal temperature will be the same each
 year.  Hence, while this analysis sort of works now, it won't in the
 future.  However, it seems reasonable to model the autocorrelation
 effect within the time-series as constant through time.
 
 3) When I look at the process of temperature I can say, yes today is
 more likely to be similar to yesterday than the day before.  There is
 autocorrelation and random fluctuation, hence it makes sense to model
 it this way.  For the record, as simple AR1 model better account for
 the seasonal fluctuation than a gam, and my ARMA(2,0) model does an
 even better job.
 
 Hope this helps, J
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] ordipointlabel with shortened names

2013-06-20 Thread Gavin Simpson
Resending as I had some mail trouble yesterday and don't see this one to
have gone through. See in-line below...

On Tue, 2013-06-18 at 12:58 -0700, Kevin McCluney wrote:
 Hi,
 
 I've been trying to use ordipointlabel() to add taxa names to an nmds
 (metaMDS) graph in VEGAN.  I can add the full names from the database, but I
 would like to use the shortened names I created using make.cepnames().  I've
 tried:
 
 pl3v2 - ordipointlabel(mds, dis=sp, add = TRUE, lab=shnam)

I'm afraid this is not possible yet - the labels are hard coded from the
species scores. Hence I suggest you do

names(foo) - make.cepnames(names(foo))

where foo is your data frame. Then refit the NMDS.

I'll see about allowing the passing in of labels but it won't happen for
a few weeks.

HTH

G

 But I get the following error message:
 
 Error in text.default(lab, labels = labels, col = col, cex = cex, font =
 font,  : 
   graphical parameter lab has the wrong length
 In addition: Warning message:
 In text.default(lab, labels = labels, col = col, cex = cex, font = font,  :
   NAs introduced by coercion
 
 I've also tried labels instead of lab and I get:
 
 Error in text.default(lab, labels = labels, col = col, cex = cex, font =
 font,  : 
   formal argument labels matched by multiple actual arguments
 
 I guess this function doesn't yet work like orditorp?  Can anyone think of a
 workaround?
 
 I've tried orditkplot:
 
 orditkplot(pl3v2, dis = sp)
 
 But I get this error:
 
 Error in structure(.External(dotTclObjv, objv, PACKAGE = tcltk), class =
 tclObj) : 
   [tcl] bad screen distance -NaN.
 
 I've also tried using identify(), but I have a few taxa that are literally
 right on top of each other and try as I might, I can't get all of the taxa
 that are on top of each other to show up.
 
 Any help would be appreciated.  Thanks!
 
 Kevin E. McCluney, PhD
 Post-doctoral Research Scholar
 Department of Entomology
 North Carolina State University
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/ordipointlabel-with-shortened-names-tp7578224.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada


-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] simple question about nmds

2013-06-10 Thread Gavin Simpson
You can use them, but you shouldn't use them individually. Directions in
NMDS aren't like the independent (orthogonal) components of PCA or CA.
You need the distances in *both* (assuming k = 2, *all* for k  2) NMDS
dimensions to recover the solution. Also directions across and up/down
the NMDS diagram are arbitrary - we can rotate the solution and the
solution remains the same.

What vegan does here is to use the coordinates in NMDS dimensions 1 and
2 (or 1,...,k) as predictors for a single response variable. This is a
bit silly - the models then says that the locations in the ordination
space *influence* the response variable! But if there is such a
relationship, it suggests the converse, that the response varies over
the gradients in the ordination.

G

On Mon, 2013-06-10 at 08:37 +, Paolo Piras wrote:
 Hi,
 actually I think you CAN use them (as predictors or response either)
 because, albeit in its unique way, NMDS **IS** a method to summarize a
 highly multidimensional phenomenon.
 best
 paolo
 
 Da: r-sig-ecology-boun...@r-project.org [r-sig-ecology-boun...@r-project.org] 
 per conto di Simone Ruzza [simone.ruzz...@gmail.com]
 Inviato: lunedì 10 giugno 2013 10.26
 A: r-sig-ecology@r-project.org
 Oggetto: Re: [R-sig-eco] simple question about nmds
 
 Apolologies, I re-phrase what I have said before: I would be interested
 using the site scores within a multiple regression settings, but the the
 scores a response rather than predictors.
 
 Best wishes,
 
 Simone
 
 
 On Mon, Jun 10, 2013 at 9:19 AM, Simone Ruzza simone.ruzz...@gmail.comwrote:
 
  Dear list,
 
  apologies for the total beginner's question. I was wondering whether
  one can use the site scores of an NMDS ordination to do do further
  analyses as typically done for other ordination methods, e.g. use the
  axis scores as predictors in a multiple regression settings. I think
  it should not be possible, because the aim of NMDS is not to summarize
  the major patterns of variations withing a multivariate dataset. Do
  you confirm?
 
  thanks in advance,
 
  Simone
 
 
 [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Random Forest classification on species counts

2013-06-05 Thread Gavin Simpson
On Thu, 2013-05-30 at 20:36 +, Hall, Kyle wrote:
 First time poster, please forgive me for errors.
 
 I have a data set of 23 sites with 145 different species counts for
 macroinvertebrate communities for a given year. each species is
 represented at least once per site and there are a lot of 0's for some
 species. I have been applying a variety of vegan functions to the data
 set to get a better understanding of the structure and I would like to
 classify sites based on species using randomForest. My thought is that
 this will give me a more understandable classification based on
 species that I can use to cluster my sites and also see which species
 are of more importance in classification.
 
 Question 1. Am I barking up the wrong...tree (pun intended) with
 randomForest for this purpose?

Yes - I doubt unsupervised RF would give you anything more than you
could get from a suitably-chosen dissimilarity matrix or even ordination
to check if you actually have clusters.

For supervised RF, even if you had a classification, you have far to few
sites/samples to warrant a machine learning tool.

 With PCA two sites are typically separated from the rest but the other
 21 sites show no discernible structure; spread like white noise over
 both axes.
 When I perform NMDS I tend to get a shot gun look and there are not
 tight groupings on these reduced axes. However with Wards clustering
 (in hclust) I do see some clusters plotting heavily to one side of an
 axis or another (albeit spread wide on the orthogonal axis).

Ward's clustering, as with any clustering, will find clusters - it
[Ward's method] tends to find compact, spherical ones IIRC and hence
often looks convincing. Your job is to demonstrate that the clustering
into k cluster explains more of the variance in the model than no
clustering. Simply eye-balling the dendrogram is not a solution to this.

 Question 2. Is it possible that my data set just doesn't have enough
 structure to neatly classify Sites by species count or am I simply a
 newbie that is applying randomForest incorrectly?

With so few data I wouldn't both with machine learning tools - they are
designed to work with hundreds and thousands or more samples.

HTH

G

 Example of data structure:
 Site ABLA.MAL   ABLA.PAR   ACEN.SPP   ACRO.MEL
 MC14A 1120
 MC17 4200
 MC22A 8000
 MC25 13   300
 MC27 0000
 MC29A1 1000
 MC30A 1000
 MC31A 4100
 MC31B 4000
 MC33 8000
 MC38 7000
 MC40A 12   300
 MC42 0000
 MC45 9000
 MC47A 0000
 MC49A 5000
 MC50 2001
 MC51 13   000
 MC66 4000
 MY11B 13   100
 MY13 0000
 MY7B 1000
 MY8  3201
 
 
 This is my call to randomForest:
 
 FY09BUGS.rF - randomForest(Site~ .,data=FY09Bugs, ntree=500, 
 mtry=sqrt(ncol(FY09Bugs)), replace=TRUE,importance=TRUE, proximity=TRUE, 
 norm.votes=TRUE, keep.forest=TRUE, do.trace=100)
 
 I am following the iris data example with my formula but the print data on 
 FY09BUGS.rf returns 100% OOB error rate and the summary returns:
 
 summary(FY09BUGS.rF)
 Length Class  Mode
 call   11  -none- call
 type1  -none- character
 predicted  23  factor numeric
 err.rate12000  -none- numeric
 confusion 552  -none- numeric
 votes 529  matrix numeric
 oob.times  23  -none- numeric
 classes23  -none- character
 importance   3625  -none- numeric
 importanceSD 3480  -none- numeric
 localImportance 0  -none- NULL
 proximity 529  -none- numeric
 ntree   1  -none- numeric
 mtry1  -none- numeric
 forest 14  -none- list
 y  23  factor numeric
 test0  -none- NULL
 inbag   0  -none- NULL
 terms   3  terms  call
 
 One concern I have is that the iris example does not appear to give a 
 training data set and so I don't believe I have done that either. I feel like 
 there is potential here but I can't seem to find the solution searching 
 online so I put the questions to you! Thanks in advance for any assistance or 
 constructive criticism.
 
 Kyle
 
 
 Kyle Hall .
 City of Charlotte Storm Water Services
 Water Quality Modeler
 600 East Fourth Street
 Charlotte, NC 28202
 704.336.4110
 Fax: 704.353.0473
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw

Re: [R-sig-eco] anova.cca question / missing data in constraining matrix

2013-06-05 Thread Gavin Simpson
On Tue, 2013-06-04 at 08:23 -0700, ckellogg wrote:
 Hi Jari,
 I have a final (hopefully) question.  I have narrowed down the number of
 environmental variables and removed some of the community (and associated
 env. data) rows if they are missing too many environmental variables.
 Then, when I run the anova.cca by terms I get the following error message.
 anova(m2,by='terms',perm=999)
 Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

This pops up from time to time and IIRC relates to numerical issues for
the particular matrix - for example you are fitting too complex a model
for the data to support. It usually indicates something slightly on the
border of being ill-conditioned.

You can try reordering the data to see if that helps. Scaling the data
in some way may also help. Also try debugging the function call to see
in which of the terms the function is failing.

HTH

G

 any thoughts about why this is happening or how I can avoid it so that I
 can run the anova by terms.
 
 Thank you so much for all of your help with this!
 
 colleen
 
 
 
 On Tue, Jun 4, 2013 at 1:59 AM, Jari Oksanen [via r-sig-ecology] 
 ml-node+s471788n7578184...@n2.nabble.com wrote:
 
  Dear Colleen,
  On 03/06/2013, at 22:32 PM, ckellogg wrote:
 
   Hello Jari,
   Thank you for your help with this.  The solution you suggested in your
   second post worked quite well.  However, i think another subset of my
  data
   is too 'holey', because when I run CCA on this set of environmental
   variables (or the a CCA with the previous environmental variables and
  the
   additional ones), I get an error:
  
   toolik250.cca2
  
  -cca(toolikotus250.ra~logtemp+conductivity+pH+logBacProd+DIC+logDCO2+sqrtDCH4+logDOC+sqrtPhosphate+sqrtNitrate+sqrtTDN+sqrtTDP+logPC+logPN+Ca+Mg+logNa+logK+SO4+logChloride+Silica,toolikenv.s,
 
   na.action=na.exclude)
   Error in predict.cca(x, newdata = excluded, type = wa, model = CA) :
model “CA” has rank 0
  
   The CCA runs if I use na.action=na.omit, but then when I run the anovas,
   there is apparently no residual component.  For example,
   No residual component
  
   Model: cca(formula = toolikotus250.ra ~ logtemp + conductivity + pH +
   logBacProd + DIC + logDCO2 + sqrtDCH4 + logDOC + sqrtPhosphate +
  sqrtNitrate
   + sqrtTDN + sqrtTDP + logPC + logPN + Ca + Mg + logNa + logK + SO4 +
   logChloride + Silica, data = toolikenv.s, na.action = na.omit, subset =
   -toolik250.cca2$na.action)
   Df  Chisq F N.Perm Pr(F)
   Model12 5.30030
   Residual  0 0.
  
  Yes, probably too many holes. You have no residual variation which
  indicates that the number
  of predictor variables (constraints) is higher than the number of
  remaining observations.
 
  Cheers, Jari Oksanen
 
   So, I am thinking that examining the relationship between the microbial
   community and this subset of environmental variables might not be
  possible
   without my first manually curating which samples and variables should be
   included, correct?
  
   Thank you,
   Colleen
  
  
  
   --
   View this message in context:
  http://r-sig-ecology.471788.n2.nabble.com/anova-cca-question-missing-data-in-constraining-matrix-tp7578175p7578179.html
   Sent from the r-sig-ecology mailing list archive at Nabble.com.
  
   ___
   R-sig-ecology mailing list
   [hidden email] http://user/SendEmail.jtp?type=nodenode=7578184i=0
   https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
  --
  Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
  [hidden email] http://user/SendEmail.jtp?type=nodenode=7578184i=1,
  Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa
 
  ___
  R-sig-ecology mailing list
  [hidden email] http://user/SendEmail.jtp?type=nodenode=7578184i=2
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
 
  http://r-sig-ecology.471788.n2.nabble.com/anova-cca-question-missing-data-in-constraining-matrix-tp7578175p7578184.html
   To unsubscribe from anova.cca question / missing data in constraining
  matrix, click 
  herehttp://r-sig-ecology.471788.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7578175code=Y3Rla2VsbG9nZ0BnbWFpbC5jb218NzU3ODE3NXw2MzE3Nzc4OTg=
  .
  NAMLhttp://r-sig-ecology.471788.n2.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
Gavin Simpson, PhD  [t] +1 306

Re: [R-sig-eco] Mantel() with strata

2013-05-24 Thread Gavin Simpson
On Thu, 2013-05-16 at 08:39 +, SILVIA CALVO ARANDA wrote:
 Hi all
 Im computing mantel test between species
 dissimilarity and geographic distance between all pairwise island comparisons
 in three archipelagos (each with 9, 7 and 3 islands) within the same
 biogeographical region. Im using mantel() of vegan package, and I was
 wondering if I should use the strata argument to control for the archipelago
 effect. Does anyone know if using archipelago as random effect with only 3
 levels is enough to obtain reliable estimates of variance? I dont think so.
 Instead, if in the same scatter plot of species dissimilarity vs geographic
 distance I mark differently the points corresponding to within-archipelago and
 between-archipelago comparisons, how can I test (through permutations) the
 statistical significance of both correlations?
 Any input on this issue will be ok for me!
 Thank you in advance

`strata` controls the way the permutation test is performed. It is
essentially a block term whereby permutations are derived by random
permutation of samples *within* each block (i.e. samples are *not*
shuffled between blocks). This preserves the clustering of the data
whilst perturbing the exact link between the geographic and
compositional distances. `strata` is therefore usually a factor
variable.

This isn't really a random effect; where one probably wouldn't use a
variable with only three levels as a random effect there is no reason
not to use it for `strata` in the permutation test if the null
distribution you are testing needs to retain the archipelago-based
clustering of samples.

HTH

G

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] r2 values from envfit in NMDS

2013-05-17 Thread Gavin Simpson
On Wed, 2013-05-15 at 12:45 -0400, Eric Niederhauser wrote:
 Hello,
 
 Please forgive my ignorance if this is really basic.
 
 I am Using NMDS and envfit in the package vegan to determine the
 environmental variables contributing to the differences between two
 distinct groups of sites and to determine if the two groups of sites are in
 fact distinct.
 
 Envfit provides r2 values that, if I understand it correctly, quantify the
 degree to which the environmental variables influence the distribution of
 the sites (points) in the ordination space. They don't however reflect the
 degree to which the variables affect the separation of the distinct site
 groups.  An environmental variable may have a high r2 value but be
 perpendicular to the axis of greatest group separation.
 
 Is my understanding of the r2 values provided by envfit correct? Is there a
 way to quantify the contribution of environmental variables to site group
 separation in NMDS? Or should I separately use something like randomForest?

Kind of, though your explanation is back to front. The model fitted is

x_i = \beta_1 Ax_{ji} + \beta_2 Ax_{ki} + \varepsilon

i.e. we say the axis scores on axes j and k (Ax) affect the values of
the environment. And as such we reverse this silly statement and say
that if there is such a relationship (between axis scores and
environment) then the environment explains, to some degree, the
dissimilarity between sites.

Note that NMDS, like most other ordination methods, is focussed on
sites, not groups of sites. Hence no one ever claimed that it was
designed to best separate groups of sites. envfit simply fits vectors
into k-d ordination configurations; again it knows nothing of groups.

If we *had* a discriminant analysis method in vegan, which does set out
to best separate groups (under certain conditions), then that ordination
may be doing what you want but envfit would still not care about the
groups; it would simply project a vector into the resulting ordination
space.

If you want to discriminate between groups that you have defined a
priori, then a random forest is one of the machine learning tools that
might usefully be applied.

HTH

G

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] NMDS and envfit

2013-05-15 Thread Gavin Simpson
On Wed, 2013-05-15 at 14:42 +0300, adrien wrote:
 Hello,
 
 for the function envfit in package vegan, the manual suggests that The 
 function fits environmental vectors or factors onto an ordination but 
 since there are scores for both sites and species, does envfit correlate 
 the environmental factors to the sites or to the species NMDS scores? Or 
 to both, at the same time? Or am I missing something?

There are really only site scores in nMDS, and anyway, all versions of
envfit would work on the site scores. So only the chosen axis scores are
used in the envfit process.

As for the species scores; metaMDS adds these as a convenience feature
if the species data is available to it. They are added as a weighted
average of the site scores and then possibly expanded. 

G

 thanks for any input on this!
 
 adrien


-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] FW: betadisper

2013-05-14 Thread Gavin Simpson
On Thu, 2013-05-09 at 14:50 +, SILVIA CALVO ARANDA wrote:
 Dear all
 
 I'm using betadisper () to evaluate if geographical and historical
 heterogeneity differ statistically between two archipelagos. I have
 not problems in obtaining the results for the geographical hypothesis
 (computing the Euclidean distance between UTM centroids of each
 island), but for the historical hypothesis the following message
 appears:
 Error en vectors[, want, drop = FALSE] %*% diag(sqrt(abs(eig))) : 
   argumentos no compatibles

Assuming I translate that correctly, it means `vectors[, want, drop =
FALSE]` and `diag(sqrt(abs(eig)))` have dimensions that do not match ofr
the requirements of the matrix multiplication operation.

Can you send me the data and code you are using - **off list** - and
I'll take a look and see what is going wrong?

G

 I calculate the historical triangular matrix computing the Euclidean
 distance between the oldest geological age of each island (all of them
 are from volcanic origin). I've also tried calculating just the
 difference in mya between all pairwise comparisons of islands, which
 gives basically the same results (but with different scaling).
 Can anybody help me with this problem?
 Thank you in advance
 Silvia
   
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf

2013-04-22 Thread Gavin Simpson
I would say that it *is* important, in general. However, you don't say
if you retried running `monoMDS` on the Hellinger transformed data
(without the Bray-Curtis metric - you should use Euclidean with
Hellinger transformation)? If you didn't try rerunning with out
Bray-Curtis and see if it converges. Otherwise, try many more iterations
and get vegan to start monoMDS from the best solution from the first set
of runs.

See `?metaMDS for details.

G

On Mon, 2013-04-22 at 08:26 +, Aurélie Boissezon wrote:
 Hello everybody!
 I didn't imagine that my questions will lead to such a debate among 
 researchers :) . It helps me to get ready for future reviewers' comments.  ;)
 Just a question still opened about NMDS (Gavin?):
 Is it important to reach a convergent solution? since the best solution 
 ordinate species always in similar way? Because as I said even with stricter 
 criteria the analysis don't reach a convergent solution.
 
 Best regards,
 
 Aurélie
 
 ---
 Aurélie Rey-Boissezon
 Ph-D Student
 University of Geneva
 Section of Earth and Environmental Sciences - Institute F.-A. Forel
 Aquatic Ecology Group
 Uni Rondeau
 Site de Battelle - Bâtiment D
 7, route de Drize - 1227 Carouge
 Geneva
 Switzerland
 Tel. 0041 (0) 22379 04 88
 
 aurelie.boisse...@unige.ch
 http://leba.unige.ch/team/aboissezon.html
 
 De : fgill...@gmail.com [fgill...@gmail.com] de la part de François Gillet 
 [francois.gil...@univ-fcomte.fr]
 Date d'envoi : samedi 20 avril 2013 10:59
 À : Gavin Simpson
 Cc: Aurélie Boissezon; r-sig-ecology@r-project.org
 Objet : Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf
 
 
 2013/4/19 Gavin Simpson 
 gavin.simp...@ucl.ac.ukmailto:gavin.simp...@ucl.ac.uk
 I really don't see why this has to be an either/or situation.
 
 I fully agree: direct and indirect gradient analyses are complementary! Sorry 
 for not having stressed that in my short answers...
 
 François
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] extract distance from betadisp {vegan}

2013-04-22 Thread Gavin Simpson
The distance to centroid for a site isn't a measure of that site's alpha
diversity. It is a reflection (an approximation) of the compositional
similarity of the sample to the other samples; distances between sites
reflect compositional dissimilarity.

The value you want are in the `$distances` component of the object
returned by `betadisper`.

I should add those as one of the options that `scores.betadisper`
extracts for you.

But I'm not convinced from what you write in your email that this
comparison is warranted nor that it will be fruitful nor enlightening.

HTH

G

On Mon, 2013-04-22 at 22:17 +, Mitchell, Kendra wrote:
 I've run betadisp on a set of communities and would now like to
 compare the distance for each sample from it's centroid to other
 measures such as alpha diversity.  Basically I want to check that
 increased dispersion isn't simply a reflection of overall diversity.
 It seems like I should be able to pull that out of the disp object but
 I haven't figured out how.  thanks
 
 Kendra
 
 --
 Kendra Maas Mitchell, Ph.D.
 Post Doctoral Research Fellow
 University of British Columbia
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] RE : CCA vs NMDS and ordisurf

2013-04-19 Thread Gavin Simpson
A contrary view in-lined below:

On Fri, 2013-04-19 at 15:19 +0200, François Gillet wrote:
 A lot of questions, some responses below...
snip /
  Why not explore unconstrained ordination methods and went further with
  NMDS (V2 mission in Anderson et al 2011)?
 
 
 Just because your purpose is to explain community structure by
 environmental variables (a regression-oriented question). Direct gradient
 analysis (especially with RDA and adjusted R-square) is in this case more
 powerful than indirect gradient analysis (from NMDS or any other
 unconstrained ordination).

I think you need to justify the more powerful there! :-) I see uses
for both the constrained and unconstrained methods here. A comparison,
especially if your do PCA vs RDA (with Hellinger or similar
transformation) or PCoA vs capscale (with any distance measure) allows
you to investigate the degree to which your constraints relate to the
major patterns in the species responses.

These are complementary approaches and one would do well to use them
both.

   I understood that I was wrong when using Bray-Curtis distance on
  hellinger transformed data before NMDS, I have to choose. But that I am
  right when superimposing vector or gam surface on NMDS ordinations.
 
 
 That's right, but you can fit a GAM model on RDA results as well!

You can, but the axes are still formed through linear functions of the
constraints. The constrained methods don't fit non-linear functions
(well you can introduce quadratic terms...) in the constraints.

I really don't see why this has to be an either/or situation.

G

 Cheers,
 
 Franois
 
 
 ---
 Prof. *Franois Gillet*
 Universit de Franche-Comt - CNRS
 UMR 6249 Chrono-environnement
 UFR Sciences et Techniques
 16, Route de Gray
 F-25030 Besanon cedex
 France
 http://chrono-environnement.univ-fcomte.fr/
 http://chrono-environnement.univ-fcomte.fr/spip.php?article530
 Phone: +33 (0)3 81 66 62 81
 iPhone: +33 (0)7 88 37 07 76
 Location: La Bouloie, Bt. Propdeutique, *-114L*
 ---
 Editor of* Plant Ecology and Evolution*
 http://www.plecevo.eu
 ---
 *
 ***
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-11 Thread Gavin Simpson
On Thu, 2013-03-07 at 11:13 -0800, Rich Shepard wrote:
 On Thu, 7 Mar 2013, Philippi, Tom wrote:
 
  I would look at packages bio.infer, paltran, fossil, and analogue, and
  search to see if anyone has pushed them in the direction you want to go.

To this list I would add Steve Juggins' excellent rioja package. In
addition to several WA methods it also includes maximum likelihood
regression and calibration in the flavour of bio.infer.

bio.infer is based on the EPA's EMAP-West (Environmental Monitoring and
 Analysis Program for the western states) and uses benthic macroinvertebrates
 and fish with selected water chemistry parameters. It uses the ITIS
 (International Taxonomic Identification System) to provide consistency in
 naming taxa to the lowest reasonable level.

As far as I can tell, bio.infer contains all you say but as higher-level
utility functions.

However, IIRC at the heart of bio.infer is what we call maximum
likelihood regression and calibration; fit a Gaussian logistic
regression to each species to characterise species-env relationships,
then invert this set of models to find the value of the environmental
variable that maximises the likelihood of observing a sample of new
counts over the set of species. Invariably, the inversion involves
numerical optimisation to search for the value of the env that made the
new counts most likely.

You just need to give mlsolve() the relevant data objects, which seem to
be somewhat easy to create by hand if you don't need to look-up
harmonised or correct taxon names. You really don't need all the nice
ITIS hand-holding, though I'm sure it is very handy for those working on
relevant species groups.

G

Conceptually, one could assemble equivalent dataframes for diatom taxa and
 environmental conditions, but I don't know if ITIS has plants/algae in the
 system; problably does. However, the biota-environments relationships would
 be based on current conditions and whether this would be valid for sediment
 core data would need to be judged by a limnologist, not a stream ecologist
 like me.
 
 Rich
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] quantifying directed dependence of environmental factors

2013-03-11 Thread Gavin Simpson
 on diatom composition (using constrained ordination or the
additive model time series approach I used). Then I would model the
relationship between the higher-level factor (land-use, etc) that I
hypothesise to be driving changes in the lower-level,
physiologically-relevant, variable.

Whilst path analysis might be ideal tool here, I suspect you'll have
plenty of complications to address, not least the compositional aspects
(though Sarah points you to some ideas there), but also the temporal
autocorrelation prevalent in the palaeo data which would need to be
accounted for to yield appropriate p-values. If you do address these
issues and use path analysis, I for one would be very interested to here
how you got on as this would be a very useful contribution to the
palaeolimnological literature!

Wow, this got long! Anyway, hopefully some of the above will be of use
and interest, and should you wish to chat about this off-list some more
(I'm certainly very interested if you use path analysis for this), then
please do so. (Note I am still sending via my old UCL address as I have
moved to Canada and U Regina, contact details in my email signature.)

All the best,

Gavin

On Wed, 2013-03-06 at 22:12 -0500, Jay Kerns wrote:
 Hello,
 
 I'm posting to this list because I believe it's the best place to
 go.  My question is R related only inasmuch as all the work I've
 done so far has been with R and I expect any answers I get from
 here will lead me to more R work.
 
 I'm consulting with an ecologist and an engineer on a project
 related to a reservoir nearby.  They've collected data on diatoms
 in the reservoir via core samples; they have sections of data over
 the past 100yrs.  They are looking at the community structure
 plus other environmental factors over the same time period.
 
 We've done a ton of work already and there's no point trying to
 hash all of that out here.  Short story: we did an NMDS, it fits
 OK (stress 0.17), there are obvious clusters in the ordination
 which correspond to a-priori clusters from ecological
 considerations (and which match an independent cluster analysis),
 we're really quite pleased overall.  We checked for relationships
 with =envfit=, most environmental variables are *highly*
 significant, yet there are a couple which aren't significant at
 all.  Here comes my question:
 
 The ecologist pointed out to me that our environmental variables
 don't have equal status (ecologically speaking); some variables
 lead to others.  For instance, there are so-called ultimate
 factors (population, percentage farmland) which contribute to
 intermediate factors (suspended solids, total phosphorous) which
 in turn contribute to direct factors (AREA, pH,...) which then in
 turn contribute to diatom structure.
 
 We have measured data on all the above and several more.  The
 model we are fitting with =envfit= is symmetric in those n
 environmental variables, but the ecology of the situation isn't
 symmetric, it's a directed top-down kind of relationship.  He
 asked me, How can we quantify that?  How can we demonstrate
 that?  Can we quantify/demonstrate that?  I don't know.
 
 There are ecologists on this list: what am I looking for, here?
 What methods do ecologists use to answer this (or related)
 question(s)?  Feel free to direct me to papers, literature,
 textbooks, whatever.  I'm trying to help answer this question
 and (this not being my subject specialty) I'm at a bit of a loss.
 
 If there are relevant R packages/vignettes/manuals you can point
 me to, that'd be cool too.
 
 Thanks for reading all the way down to here.
 
 Jay
 
 P.S. If it hadn't been for the archives of this list containing
 lengthy and poignant answers to *several* questions I've had
 already then I couldn't even have made it this far.  Thank you!
 
 
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Institute of Environmental Change  Society [f] +1 306 337 2410
523 Research and Innovation Centre  [e] gavin.simp...@uregina.ca
University of Regina[tw] @ucfagls
Regina, SK S4S 0A2, Canada

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] ordipointlabel for triplots

2013-03-07 Thread Gavin Simpson
On Thu, 2013-03-07 at 10:48 -0400, carolina monmany wrote:
 Hi all,
 
 
 I would like to use ordipointlabel() to plot a RDA but the function
 only displays species and sites. How can I include environmental
 variables as vectors?
 
 
 Thanks!

You can use

text(ord, display = bp)

(where `ord` is your fitted RDA object) before or after the
ordipointlabel() call to add the vectors with labels.

HTH

Gavin

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] nMDS plot with points of different size

2013-03-07 Thread Gavin Simpson
On Thu, 2013-03-07 at 15:37 +, Mark Fulton wrote:
 There's probably something off the shelf, but
 I use a little function written in R:
 
 #A function for doing an xyz bubbleplot.
 bubbleplot - function(x, y, z, bmax=4, bmin=.5) {
   plot(x,y, type=n)
   z - z-min(z); z - z/max(z)
   for (i in 1:length(x)) {
 points(x[i],y[i],cex=bmin+(bmax-bmin)*z[i])
   }
 }

`points()` is vectorised and can take numeric vectors for relevant
arguments. Hence you don't need the `for()` loop here. Just generate the
vector of cex values you want and pass the whole thing plus all
coordinates to `points()`:

cex - bmin+(bmax-bmin)*z
points(x, y, cex = cex)

should be sufficient.

HTH

G

 This just makes an empty plot, and draws circles in it
 scaled to values in z.  You'll need to extract the axis 
 scores from the NMDS output to use this.  Tweak
 other graphics parameters as needed to get what you
 want.
 
 Dr. Mark Fulton
 Professor of Biology
 Bemidji State University
 Bemidji, MN   56601
 http://faculty.bemidjistate.edu/mfulton/
 
 
 -Original Message-
 From: r-sig-ecology-boun...@r-project.org 
 [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Stas Malavin
 Sent: Thursday, March 07, 2013 12:07 AM
 To: r-sig-ecology@r-project.org
 Subject: [R-sig-eco] nMDS plot with points of different size
 
 Dear list members,
 
 I want to plot an nMDS diagram with points' area proportional to the 
 abundance of particular species. I could imagine just plotting with type = 
 n and then using points() with different cex, but may be some special 
 functions/packages exist for that which you can point me to?
 
 Thank you,
 Stas
 
 
 
 Junior Res Asst
 Hydrobiology Lab
 Institute of Limnology
 Russian Academy of Sciences
 
 Sevastyanova 9
 196105 Russia, St Petersburg
 http://www.limno.org.ru
 Phone: +7 (812) 387-80-60
 Fax: +7 (812) 388-73-27
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] species scores for ordination in vegan

2013-02-25 Thread Gavin Simpson
On Sun, 2013-02-24 at 11:22 +0400, Stas Malavin wrote:
 Dear list readers,
 
 Please forgive me for asking trivial questions, but I've failed to find the
 exact answer by myself. I'm interested in how the species scores are
 calculated for ordination configuration in vegan package. It's written they
 are weighted averages (of sites scores?): vegan/html/metaMDS.html,
 vegan/html/wascores.html.
 
 My questions are:
 (1) Are all sites used for calculation of the average, or just those where
 the species presents?

As the weight for an absent species is 0 the contributions of those
absent species at any given site are zero. So in one sense, they are not
used, but the computations themselves certainly use them; their effect
is just zero. Which way you want to think of this depends on what you
mean by calculated.

 (2) What's used as weights: site totals, or the abundances of particular
 species which score is calculated?

The species abundances. The fitted species score for species i is the
weighted average of the site scores for all sites k with weights given
by the abundance of species i on those sites k.

HTH

G

 Thank you in advance,
 Stas
 
 
 Junior Res Asst
 Hydrobiology Lab
 Institute of Limnology
 Russian Academy of Sciences
 
 Sevastyanova 9
 196105 Russia, St Petersburg
 http://www.limno.org.ru
 Phone: +7 (812) 387-80-60
 Fax: +7 (812) 388-73-27
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] question for the R community : Plot RDA biplot without axis ?

2013-02-25 Thread Gavin Simpson
On Mon, 2013-02-25 at 13:18 -0500, Sarah Loboda wrote:
 Hi,
 I have trouble to obtain the ordination graph I want. I want to have 4 RDA
 biplot on the same page and I don't want to have (or I want to modify) the
 axis numbers. I want the marks on the axis without numbers to maximize the
 space for each RDA plot.

A problem is the call to text() ( which calls text.cca() ). It doesn't
pass on arguments to the underlying axis() calls and hence you can't do
what you are trying to do with that function directly.

Not sure why you want the axis to be white - that draws an axis so it
will obscure anything drawn before it with white paint.

The only solution at the moment will be to modify the vegan:::text.cca()
function to change the two calls to axis() at the end of the function
definition. I suspect you could just copy the body of vegan:::text.cca
and put it into your own function, but I haven't tried it. If that fails
due to namespace issues, then use assignInNamespace() to assign your
function to the text.cca function in vegans namespace.

See the relevant help pages on how to do this. I'm about to leave the
office so I can't help further now, but if you have trouble email back
to the list and I'll see about cooking up and example...

All the best

Gavin

 This seems like a simple task but I tried different approaches and coudn't
 figure out how to change my axis. This is my code :
 
 par(mfrow=c(2,2))
 par(mar=c(0.2,0.2,0.2,0.2))
 
 ### first RDA biplot
 with(arctic, levels(site))
 shapevec- c(19,19,19,19,19,19,19,19,19,19,19,19,6,6,6,6,6,6,6,6,6,6,6,6)
 plot(spiders.rda.a, type = n, scaling = 2, las=1, tcl=0.2,
 col.axis=white)
 with(spiders.env.a, points(spiders.rda.a, display = sites,
 scaling = 2, pch = shapevec, cex=1.3))
 text(spiders.rda.a, display=bp, cex=1.1, col.axis=white, ann=FALSE)
  it is when I run this line that my y axis appear on the right but I
 don't want to
 ### in yellow, this is what I tried to make it diseappear. To put those
 arguments in plot() doesn't change anything.
 
 What should I had in the text part to make sure that the axis doesn't show
 up?
 My intention is to plot my sites as dots without text and my arrows for
 environmental variables with the name of each variable. Any other ideas on
 rda plot will be greatly appreciated.
 Thank you very much :D


___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Vegan CCA - Problem in constraining variables

2013-01-18 Thread Gavin Simpson
On Thu, 2013-01-17 at 17:50 -0200, Pedro Meirelles wrote:
 Hello everyone,
 
 This is my first post in this list and I am a new r user and I am not very
 experienced in multivariate analysis, sorry if my question is silly.
 
 I am trying to run a cca (vegan package) but I am encountering a problem: I
 want to constrain the analysis by all environment variables, but it do not
 happens. The program choose some of it. When I try to plot the results
 only these automatic chosen variables are ploted. I tried to run cca with
 vegan data (varespec and varechem) and it worked fine.
 
 There is something strange on the Biplot scores for constraining
 variables results. The CA's have only 0 values (bellow in yellow). I
 am writing the complet Biplot scores for constraining variables section.
 
 My data is like this:
 
 I have 9 samples (I will not show all data to reduce the email size).
 
 I have 9 environmental variables (nutrients).
  env
   Depth   Sal Nitrite Nitrate Amonia T_Org_N Silicate Total_I_P OrthoP
 Vit1 63 37.070.031.24  0.031   5.849 0.85  0.14   0.08
 Vit2 63 37.070.031.24  0.031   5.849 0.85  0.14   0.08
 Dav1 40 37.350.051.43  0.031   8.849 0.76  0.11   0.14
 Dav2 40 37.350.051.43  0.002   8.878 0.76  0.11   0.14

By the looks of those data, some of your env samples are repeated to
make the nine total samples. That won't help matters.

You can also run `alias(spp.cca)` to see which if any terms are linearly
dependent upon one another as that can also reduce the effective number
of contraints.

Bottom line, vegan used what it could of your data and you need to look
at your data to see what is wrong as you have it and we don't.

HTH

G

 .
 .
 
 I have 32 species (actually this is bacterial and archeal phyla). I will
 not show all columns to reduce the email size.
 
  spe
   Acidobacteria Actinobacteria Aquificae ...
 Vit1  9129 5  ...
 Vit2 14154 8  ...
 Dav1  8122 4  ...
 
 
 I am running the cca with the follow script:
 
 library(vegan)
 spe - read.csv(cvt_org_test.csv, row.names=1)
 env - read.csv(test_chem2.csv, row.names=1)
  spe.hel - decostand(spe, hellinger)
  spe.cca - cca(spe.hel,env)
  summary(spe.cca) # Scaling 2 (default)
 Call:
 cca(X = spe.hel, Y = env)
 
 Partitioning of mean squared contingency coefficient:
   Inertia Proportion
 Total 0.06054 1.
 Constrained   0.03120 0.5154
 Unconstrained 0.02934 0.4846
 
 Eigenvalues, and their contribution to the mean squared contingency
 coefficient
 
 Importance of components:
  CCA1 CCA2 CCA3 CCA4 CA1  CA2
   CA3  CA4
 Eigenvalue0.01906 0.006717 0.003055 0.002371 0.01433 0.007767
 0.00482 0.002418
 Proportion Explained  0.31480 0.110960 0.050470 0.039170 0.23673 0.128300
 0.07963 0.039940
 Cumulative Proportion 0.31480 0.425760 0.476230 0.515400 0.75213 0.880430
 0.96006 1.00
 
 Accumulated constrained eigenvalues
 Importance of components:
  CCA1 CCA2 CCA3 CCA4
 Eigenvalue0.01906 0.006717 0.003055 0.002371
 Proportion Explained  0.61079 0.215280 0.097920 0.076010
 Cumulative Proportion 0.61079 0.826070 0.923990 1.00
 
 Scaling 2 for species and site scores
 * Species are scaled proportional to eigenvalues
 * Sites are unscaled: weighted dispersion equal on all dimensions
 
 
 Species scores
 
  CCA1  CCA2  CCA3  CCA4CA1
CA2
 Acidobacteria0.065258  0.038704  0.009574 -0.112006  0.0817235
  0.0143286
 Actinobacteria   0.077451  0.049255 -0.046415  0.045290  0.0178117
 -0.0231979
 Aquificae0.137472 -0.057698  0.017360 -0.017047  0.0859391
 -0.0749676
 .
 .
 .
 
 Site scores (weighted averages of species scores)
 
   CCA1CCA2CCA3  CCA4  CA1  CA2
 Vit1  -0.35189 -0.7507 -2.6425 -0.961233 -0.72976  0.08807
 Vit2   0.04988 -0.2552 -1.0121  0.321500  0.66991 -0.08085
 .
 .
 .
 
 Site constraints (linear combinations of constraining variables)
 
  CCA1CCA2CCA3CCA4  CA1  CA2
 Vit1  -0.1424 -0.4923 -1.7924 -0.2924 -0.72976  0.08807
 Vit2  -0.1424 -0.4923 -1.7924 -0.2924  0.66991 -0.08085
 .
 .
 .
 
 Biplot scores for constraining variables
 
CCA1CCA2CCA3 CCA4 CA1 CA2
 Depth0.6576 -0.3701 -0.6555 -0.02914   0   0
 Sal -0.6385  0.3949  0.6603  0.01846   0   0
 Nitrite  0.5303  0.7679  0.1164 -0.33999   0   0
 Amonia  -0.9195  0.1331  0.3570  0.09710   0   0
 
 
 Thank you very much for your help and support!
 
 All the best!
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e

Re: [R-sig-eco] Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'

2013-01-16 Thread Gavin Simpson
Hi Jesse,

Can you send me the data *and* *exact* code you used so I can look into
this further? I promise to delete the data once I have gotten to the
bottom of the problem.

If you can, please do so *off list*. If you can't then it might help to
scale the numbers a bit as range of 5 orders of magnitude may be causing
some numerical issues with your data.

Note this has nothing to do with vegan; cocorresp is a separate package.

Re the last question; it is possible and IIRC there is some Matlab code
to do some of this in the supplementary materials for the ter braak 
schaffers paper. I got some way to implementing this in R but finishing
it off went to the back burner and I never get back to it since.

All the best,

Gavin

On Tue, 2013-01-15 at 19:09 -0800, Jesse_B wrote:
 Question 1 - It's been a while, so I don't know who will see this, but I am
 having the same issue.  I have count data from two species matrices (fish
 and inverts) and I am trying to run CoCA through cocorresp.  Symmetric CoCA
 works fine, and is the main thing that I need, but I would like to be able
 to switch predictor-response species matrices in a predictive CoCA, to see
 if there are differing patterns of top-down/bottom-up concordance.   I have
 substantial skew in the data, so I have log+1 transformed both sets of data. 
 I can run crossval on the raw data (not transformed, 99 samples [33 sites
 over 3 seasons], 72 fish species, 226 invert species, individual numbers are
 in the same ranges, between 0 and 10K for both fish and inverts), but on the
 transformed data, I get the Error in La.svd(x, nu, nv) : error code 1 from
 Lapack routine 'dgesdd'  message consistently on the 5th site.  I am
 comfortable _using_ R and the vegan package in particular, but I am not
 experienced in more deep coding, so I don't have a handle on how to turn
 LINPACK on.  R version 2.15.2, vegan 2.0-4, cocorresp 0.2-0
 
  crossval(fishlog,invertlog)
 LOO - Site: 1 - Complete
 LOO - Site: 2 - Complete
 LOO - Site: 3 - Complete
 LOO - Site: 4 - Complete
 LOO - Site: 5Error in La.svd(x, nu, nv) : error code 1 from Lapack routine
 'dgesdd'
 
 Question 2 - Is it possible to run crossval on matricies for a CCA? to make
 it a PLS-CCA (as in Schaffers et al. 2008) or am I misunderstanding the
 process that they used?
 
 Thanks in advance!
 
 Jesse
 
 
 
 
 --
 View this message in context: 
 http://r-sig-ecology.471788.n2.nabble.com/Error-in-La-svd-x-nu-nv-error-code-1-from-Lapack-routine-dgesdd-tp7556369p7577802.html
 Sent from the r-sig-ecology mailing list archive at Nabble.com.
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] GCV in gam with mgcv

2013-01-16 Thread Gavin Simpson
On Tue, 2013-01-15 at 14:40 +, Emmanuel Castella wrote:
 Dear R-users
 I am trying to understand the difference between the GCV score
 returned by summary() of a gam object in the mgcv package, and the
 output of a k-fold cross-validation of the same gam model. Thank you
 in advance for any answer or reference to a published document.
 Sincerely
 Emmanuel Castella

I don't have all the details to hand, but as no one has replied,
publicly at least, I'll have a stab. GCV is a means of approximating the
results of an explicit CV. The GCV score in mgcv:::gam() is the minimum
GCV score arrived at during fitting for various value of smoothness
penalty; in other words it is the thing that was minimised to arrive at
the fitted model for which smoothness selection was performed.

If you did a k-fold CV to find the optimal smoothing parameters, the
degrees of freedom for the fitted smooth(s) should be similar to the
degrees of freedom used by the model when GCV was used.

The difference between the two is that GCV approximates what you'd get
if you did the actual CV without having to actually fit all those models
during the CV steps. Not doing all the fitting can be a huge
computational saving!

For the details I strongly suggest you read Simon Wood's book

  Wood, S.N. (2006) Generalized Additive Models: An
  Introduction with R. Chapman and Hall/CRC.

And having said all that, Simon and colleagues have shown that GCV can
under-smooth data, especially when the objective function is very flat
in the region of the optimal model. The conclusion that I draw from
their papers is that using `method = REML` or `method = ML` is
usually provides the best-performing smoothness selection and in
addition one can turn on extra penalties (one via `select = TRUE`, the
other via an argument to each `s()` term) that will perform model
selection (i.e. shrink terms out of the model) for you.

HTH

G 

 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] betadisper() and variation in community change

2012-10-22 Thread Gavin Simpson
betadisper() will give you a test of homogeneity of variance between
groups; in other words it tests the null that the variance of the groups
of sites does not change. adonis() is a test of location, which will
test the null of no compositional change between groups of sites. Here I
assume by groups of sites, groups is defined by the time points of
sampling?

It is somewhat unclear what you mean by variation in plant community
change over time among sites. Do you want to know if on average the
composition of your groups of sites is changing through time? Or do you
want to know something about change over time in that compositional
change. The former could be done by adonis() the latter could be seen as
an analysis of the first derivative of the change given by adonis() and
I don't think we have anything for that easily to hand in vegan or R.

Perhaps you could describe more clearly what it is that you hope to
show/demonstrate?

I have some rough code implementing an idea of Jari's (and a colleague
of his) for a better way of quantifying rates of composition change and
total amount of composition change:

Rate of compositional turnover along gradients and total gradient
length. Journal of Vegetation Science 6, 815-824. JO  Tonteri, T.

But you would have to do that for each site individually and then
perform some other test if you want to group data. The code is far from
production - it was a proof of concept using smooth.spline() which
doesn't really model the mean-variance relationship of abundance or
compositional data properly, but it had analytical derivatives via the
predict() method so it got used out of convenience rather than fitting
via mgcv:::gam() and having to derive derivatives via finite differences
(which is easy but tedious).

HTH

G

On Wed, 2012-10-17 at 15:48 +0200, Gustaf Granath wrote:
 Dear all,
 I want to compare variation in plant community change over time among 
 sites. The focus is quantify the species community change over time and 
 not so much to test if the community has changed significantly. The 
 sites do not share many species but I hope to investigate their 
 variation over time.
 I am a bit unsure if my approach using betadisper() is correct and maybe 
 there are better ways to do this. Any input is appreciated.
 
 For simplicity, say that we have 2 site with 4 and 2 subplots, 
 respectively. These plots were measured with e.g. 5 years intervals.
 The aim is to compare variation of plant community change over time 
 between the two sites. Note that this is at the site level, not at the 
 sub plot level.
 library(vegan)
 ## Lets use the grazed/ungrazed example but here I treat it as
 ## 4 subplots (grazed) and 2 subplots (ungrazed). These
 ## subplots were sampled 4 times.
 data(varespec)
 dis - vegdist(varespec)
 sites - factor(c(rep(1,16), rep(2,8)), labels = c(grazed,ungrazed))
 time-c(rep(1:4,each=4),rep(1:4,each=2))
 sites_time-factor(paste(sites,time,sep=_))
 ## Calculate multivariate dispersions for each
 ## site:time combination
 mod1 - betadisper(dis, sites_time)
 #extract centroid positions for each site:time combination
 cen.si-scores(mod1, display = c(centroid), choices = c(1,2))
 # Calculate distance matrix
 time.dist-vegdist(cen.si,method=euclidean)
 sites_main - factor(c(rep(1,4), rep(2,4)), labels = c(grazed,ungrazed))
 # Run betadisper() on the site:time positions
 mod2 - betadisper(time.dist,sites_main)
 #compare the two sites
 boxplot(mod2)
 
 #use 3 PCoA axis to get the 3D movement of the plant community
 cen.si-scores(mod1, display = c(centroid), choices = c(1,2,3))
 time.dist-vegdist(cen.si,method=euclidean)
 sites_main - factor(c(rep(1,4), rep(2,4)), labels = c(grazed,ungrazed))
 mod3 - betadisper(time.dist,sites_main)
 boxplot(mod3)
 
 Cheers,
 
 Gustaf
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] betadisper() and variation in community change

2012-10-22 Thread Gavin Simpson
Anything where you are interested in the centroid moving in
compositional space is handled by `adonis()` at least for testing if the
change is significant.

betadisper() handles the case when you want to know if the spread about
the centroid is changing.

Neither do what you want in total I don't think. You could just look at
how betadisper computes the centroids and do that for sites through time
and then repeat for each sub-plot through time. At least then you'd have
the data. but that is the only bit of the betadisper() code you want.

I'm not sure then about how you would start testing anything...

HTH

G

On Mon, 2012-10-22 at 15:46 +0200, Gustaf Granath wrote:
 Thanks Gavin. Let me try to explain what Im trying to do.
 
 My data are hierarchically structured, I have subplots within sites
 and  these subplots are measured over many years. I want to compare
 how vegetation composition varies over time at the site level (a bit
 like the example in Zuur 2007, pp 463, but more sites, not only two).
 In addition, not mentioned in my earlier question, I also want to
 compare the site variation with subplot variation. I guess you can see
 it as I want to quantify how much the site centroids move around in an
 ordination with how much the subplots centroids move around on average
 over time. So if I had one response variable, this would be a rather
 simple task in a mixed-model where I can extract the different
 variance components - the problem is to do this with multivariate data
 (can you say variance of compositional change?). 
 
 The reason for this kind of analyses is that we have hypotheses
 regarding how species composition in different habitats varies over
 time, some sites are predicted to vary more than others. In addition,
 we want to investigate if the within-site variation also differs. For
 example, a site may show little compositional variation over time
 overall, but the subplots may show large variation.
 
 Gustaf
 
 
 On 2012-10-22 13:57, Gavin Simpson wrote:
 
  betadisper() will give you a test of homogeneity of variance between
  groups; in other words it tests the null that the variance of the groups
  of sites does not change. adonis() is a test of location, which will
  test the null of no compositional change between groups of sites. Here I
  assume by groups of sites, groups is defined by the time points of
  sampling?
  
  It is somewhat unclear what you mean by variation in plant community
  change over time among sites. Do you want to know if on average the
  composition of your groups of sites is changing through time? Or do you
  want to know something about change over time in that compositional
  change. The former could be done by adonis() the latter could be seen as
  an analysis of the first derivative of the change given by adonis() and
  I don't think we have anything for that easily to hand in vegan or R.
  
  Perhaps you could describe more clearly what it is that you hope to
  show/demonstrate?
  
  I have some rough code implementing an idea of Jari's (and a colleague
  of his) for a better way of quantifying rates of composition change and
  total amount of composition change:
  
  Rate of compositional turnover along gradients and total gradient
  length. Journal of Vegetation Science 6, 815-824. JO  Tonteri, T.
  
  But you would have to do that for each site individually and then
  perform some other test if you want to group data. The code is far from
  production - it was a proof of concept using smooth.spline() which
  doesn't really model the mean-variance relationship of abundance or
  compositional data properly, but it had analytical derivatives via the
  predict() method so it got used out of convenience rather than fitting
  via mgcv:::gam() and having to derive derivatives via finite differences
  (which is easy but tedious).
  
  HTH
  
  G
  
  On Wed, 2012-10-17 at 15:48 +0200, Gustaf Granath wrote:
   Dear all,
   I want to compare variation in plant community change over time among 
   sites. The focus is quantify the species community change over time and 
   not so much to test if the community has changed significantly. The 
   sites do not share many species but I hope to investigate their 
   variation over time.
   I am a bit unsure if my approach using betadisper() is correct and maybe 
   there are better ways to do this. Any input is appreciated.
   
   For simplicity, say that we have 2 site with 4 and 2 subplots, 
   respectively. These plots were measured with e.g. 5 years intervals.
   The aim is to compare variation of plant community change over time 
   between the two sites. Note that this is at the site level, not at the 
   sub plot level.
   library(vegan)
   ## Lets use the grazed/ungrazed example but here I treat it as
   ## 4 subplots (grazed) and 2 subplots (ungrazed). These
   ## subplots were sampled 4 times.
   data(varespec)
   dis - vegdist(varespec)
   sites - factor(c(rep(1,16), rep(2,8)), labels = c(grazed,ungrazed

Re: [R-sig-eco] Question on block effect significance in 'adonis' PERMANOVA

2012-07-30 Thread Gavin Simpson
On Fri, 2012-07-27 at 09:45 +0100, Gavin Simpson wrote:
 On Mon, 2012-07-23 at 10:31 -0200, Alexandre Fadigas de Souza wrote:
  Dear friends,
  
 In performing a PERMANOVA in VEGAN's 'adonis' function, we can use strata
  to indicate a blocking variable. Contrary to standard statistical packages,
  however, adonis' output does not include the block variable explicitly among
  its Sources of Variation.

Sorry but the first line of my reply contains a critical typo:

 That is what is intended for `strata`.

That should have read That is *not* what is intended for `strata`.
Hopefully that was obvious given the following sentence, but I hope I
didn't lead to any confusion.

G

  It is a conditioning variable for
 the permutation test *only*, so it never enters into the computations
 other than to force the permutations to be freely exchangeable within
 each hill but not exchangeable between hills.
 
 If you want to include the effect of hill it needs to be included as a
 variable in the formula.
 
 I'm not fully familiar with adonis() but if you include hill in the
 fixed effects formula then I don't think you can't test significance of
 hill if you also use `strata = hill` (because each permutation will
 essentially have the same samples allocated at the hill level.
 
 HTH
 
 G
 
 Sometimes, as is my present situation, the blocking variable is
  interpretable and we want to know whether it varied significantly.
  
 I could not find this discussion in the log files of this list.
  
 Our response variable is the abundances of tree species in 18 square 
  plots
  in southern Brazil, our explanatory variables are distance to a power dam
  (near x far, binary) and altitude. Our strata/blocking variable is hill, 
  since
  treatments near or far from the power dam were replicated on the slopes of
  three different hills.
  
 It would be important to know if there were compositional differences
  between hills but this result does not appear in the output. We tried to 
  force
  its appearance using the following formula, but I do not know if this 
  distorts
  the analysis:
  
   adonis (species ~ hill + dam + altitude + declivity, method=Chao, 
   strata 
  = environment$hill, perm=4999)
  
Thanks in advance for your attention,
  
Alexandre
  
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Question on block effect significance in 'adonis' PERMANOVA

2012-07-27 Thread Gavin Simpson
On Mon, 2012-07-23 at 10:31 -0200, Alexandre Fadigas de Souza wrote:
 Dear friends,
 
In performing a PERMANOVA in VEGAN's 'adonis' function, we can use strata
 to indicate a blocking variable. Contrary to standard statistical packages,
 however, adonis' output does not include the block variable explicitly among
 its Sources of Variation.

That is what is intended for `strata`. It is a conditioning variable for
the permutation test *only*, so it never enters into the computations
other than to force the permutations to be freely exchangeable within
each hill but not exchangeable between hills.

If you want to include the effect of hill it needs to be included as a
variable in the formula.

I'm not fully familiar with adonis() but if you include hill in the
fixed effects formula then I don't think you can't test significance of
hill if you also use `strata = hill` (because each permutation will
essentially have the same samples allocated at the hill level.

HTH

G

Sometimes, as is my present situation, the blocking variable is
 interpretable and we want to know whether it varied significantly.
 
I could not find this discussion in the log files of this list.
 
Our response variable is the abundances of tree species in 18 square plots
 in southern Brazil, our explanatory variables are distance to a power dam
 (near x far, binary) and altitude. Our strata/blocking variable is hill, since
 treatments near or far from the power dam were replicated on the slopes of
 three different hills.
 
It would be important to know if there were compositional differences
 between hills but this result does not appear in the output. We tried to force
 its appearance using the following formula, but I do not know if this distorts
 the analysis:
 
  adonis (species ~ hill + dam + altitude + declivity, method=Chao, strata 
 = environment$hill, perm=4999)
 
   Thanks in advance for your attention,
 
   Alexandre
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Fwd: metaMDS - avoid species overlap in plots

2012-07-19 Thread Gavin Simpson
On Mon, 2012-06-25 at 16:46 +0200, Kay Cichini wrote:
 Hi!
 
 library(vegan)
 data(dune)
 sol - metaMDS(dune)
 
 # use argument air! see ?orditorp:
 plot(sol, type = n)
 orditorp(sol, displ = sp, air = 1)
 
 
 # or use pointLabel() from maptools package. this should avoid overplotting
 of text -

To come to this late, but we have ordipointlabel() in vegan, which is
modelled on pointLabel() and aims to do the same thing without having to
rely upon the spatial stack. It also knows about ordination objects in
vegan.

G

 # here is an example where I added cex according to species frequencies.
 library(maptools)
 x = as.vector(sol$species[,1])
 y = as.vector(sol$species[,2])
 w = row.names(sol$species)
 cex.lab = colSums(dune  0) / nrow(dune) + 1
 col.lab = rgb(0.2, 0.5, 0.4, alpha = 0.6)
 
 plot(sol, type = n)
 points(sol, displ = species, cex = 1, pch = 4, col = 3)
 pointLabel(x, y, w, col = col.lab, cex = cex.lab)
 
 
 HTH,
 Kay
 
 
 2012/6/25 Gian Maria Niccol Benucci gian.benu...@gmail.com
 
  Hi community,
 
  Very simple question. How to avoid overlap plot of species in a metaMDS()
  diagram?
 
  I used this command...
 
   text(metaMDS_new2, display=c(species), cex=0.6)
 
  but some species are plotted one over the other and is not simple to read
  the diagram.
 
  Thanks for helping,
 
  --
  Gian
 
 [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multiple comparisons among predictors generated from same data

2012-05-25 Thread Gavin Simpson
On Thu, 2012-05-24 at 15:00 -0700, J Straka wrote:
 Hello,
 
 I'm planning on using a regression model to describe seed set of plants (my
 response) using some sort of predictor based on temperature.  I have a
 number of temperature variables calculated from the same set of data
 (hourly temperatures for the growing season, converted to variables such as
 average temperature, maximum temperature, minimum temperature, degree-days
 above zero Celsius, degree days above ten Celsius, etc...), and I want to
 decide which one should be included in my model. I know that I would
 ideally select one based on prior knowledge of the system (e.g. so-called
 planned comparisons or choosing a temperature threshold that is known to
 be important for the development of seeds), but not much is known about
 this system.

What is the model for? Understanding so you want to interpret the
coefficients directly as something meaningful or for prediction?

If the latter I would say it doesn't really matter; choose the model
that gives the best out-of-sample predictions (lowest error etc), or
average predictions over a set of best/good models. Simply choosing the
best model via some sort of selection procedure may result in a model
with high variance (change the data a bit and different variables would
be selected). If so, consider a regression method that applies shrinkage
to the coefficients such as the lasso or the elastic net; this will lead
to a small bit of bias in the estimates of the coefficients but should
reduce the variance of the final model because you are considering the
selection of variables as part of the model itself.

If you want to interpret the model coefficients as something real then
you have to be very careful doing any form of selection; the stepwise
procedures and best subsets all can potentially lead to strong bias in
the model coefficients. Be removing a variable from the model in effect
you are saying that the sample estimate of the effect of that variable
on the response is 0, not some small (statistically insignificant)
value.

This is a very tricky thing to get right and I'm not sure I know the
right answer (or even if there is one!?).

 I've been warned against testing the significance of multiple predictors
 using p-values, unless I use Bonferroni correction (or some equivalent).
 Unfortunately, using Bonferroni correction would result in something like p
 = 0.05/7 (for seven different temperature variables); a rather small value
 for detecting anything! I was wondering whether it would be appropriate to
 instead use likelihood-based techniques (direct comparisons of
 log-likelihoods or AIC scores) to compare a series of models using each of
 the alternative predictors in turn, and choose the most relevant
 temperature variable (i.e. predictor) based on that.

Choosing models by AIC or BIC is just the same as doing it using
p-values; the selection procedure has all the problems I mention above.
LRTs require a significance test of the ratio of the two likelihoods, so
you are still doing a series of sequential tests that you might want to
control the overal error rate of.

There are other corrections for multiple testing. For example, see the
p.adjust() function in R for some options.

HTH

G

 Thoughts on the validity of this approach? Would any adjustments have to be
 made for multiple comparisons if I used this strategy?
 
 Jason Straka
 University of Victoria
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Poisson regression

2012-05-24 Thread Gavin Simpson
On Thu, 2012-05-24 at 11:41 +0430, Mahnaz Rabbaniha wrote:
 Dear all
 
 
 to find relation between non-normal response with independents variable i
 use this code:
 
 model1-gam(Clupeidae~s(depth)+s(temperature)+s(salinity),poisson)

I don't know if this is the problem or not, but you can't call the model
that way. if you don't name arguments then you must list them in the
order the function expects. You don;t say which `gam()` you use but
assuming it is `mgcv:::gam()` then the second argument is `data` and you
passed it a function `poisson`.

Does it work if you do:

model1-gam(Clupeidae~s(depth)+s(temperature)+s(salinity),
family = poisson)

?

Really though you should be passing it both a data and a family
argument. Assuming your data are in object named `mydata` then:

model1-gam(Clupeidae~s(depth)+s(temperature)+s(salinity),
data = mydata, family = poisson)

would be the correct way to work with the function.

G

 the result shown is:
 
 There were 50 or more warnings (use warnings() to see the first 50)
  warnings(model1)
 Warning messages:
 1: In dpois(y, mu, log = TRUE) : non-integer x = 2.079542 Error in
 cat(list(...), file, sep, fill, labels, append) :
   argument 2 (type 'list') cannot be handled by 'cat'
 
 
 what is meaning it?
 
 do i allow use it for continue?
 
 thanks

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] CA

2012-05-11 Thread Gavin Simpson
If by CA you mean Correspondence Analysis then see the Environmetrics
Task View which list several potential options with an ecological bent:

http://cran.r-project.org/web/views/Environmetrics.html

You might also consider the Multivariate Task View for a wider
selection:

http://cran.r-project.org/web/views/Multivariate.html

I'm biased but the vegan package would be a good place to start...

G

On Fri, 2012-05-11 at 08:39 +0430, Mahnaz Rabbaniha wrote:
 Hi,i would like use CA code in my data
 
 it is a metrics with 16 column ( abundance of fish larva) and 84 raw, this
 raw in three groups: Coralline,non-coralline and creek habitat . the data
 are non-normal and i have to use non-parametric analyses ,i want to find
 similarity or dissimilarity between these habitat.
 i used the nmds in past but the stress wasn't good fitness
 
 please guide me
 
 thanks
 
 Mahnaz
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [R] reception of (Vegan) envfit analysis by manuscript reviewers

2012-05-10 Thread Gavin Simpson
On Wed, 2012-05-09 at 15:51 -0600, Matt Bakker wrote:
 I'm getting lots of grief from reviewers about figures generated with
 the envfit function in the Vegan package. Has anyone else struggled to
 effectively explain this analysis? If so, can you share any helpful
 tips?
 
 The most recent comment I've gotten back: What this shows is which
 NMDS axis separates the communities, not the relationship between the
 edaphic factor and the Bray-Curtis distance.

Without further context for that quote and your manuscript to see how
you are using the method it is difficult to say whether you are doing
something silly or the reviewer is bone-headed.

I've had similar comments from reviewers about my use of the ordisurf()
function. In each case it was the reviewers' failure to understand the
methods applied that was the cause of the confusion.

As you provide little or no context I'll explain what envfit() does etc.

The idea goes back a long way (!) and is in my 1995 edition of Jongman
et al Data Analysis in Community and Landscape Ecology (Cambridge
University Press) though most likely was in 1987 version too. See
Section 5.4 of the Ordination chapter by Ter Braak in that book.

The idea is to find the direction (in the k-dimensional ordination
space) that has maximal correlation with an external variable.
Essentially, we have:

E(z_j) = b_0 + b_1x_1 + b_2x_2

where E(z_j) is the expectation (or mean, or fitted values) of the jth
external (environmental) variable, x_1 and x_2 are the axis scores in
ordination dimensions 1 and 2, and b_y are unknown regression
coefficients. This generalises to more than 2 dimensions or axes.

The biplot arrow drawn goes from (0,0) to (b_1, b_2).

You can see that the aim is to model or predict the values of the jth
environmental variable (z_j) as a linear combination of the axis or
site scores of the samples in the ordination space. Exactly the same
idea underlies the ordisurf() function except that we use a GAM and for
the right hand side of the equation multivariate splines are used which
allow a non-linear surface instead of a plane.

When applied to nMDS, if the nMDS provides a reasonable approximation to
the original dissimilarities, then envfit() will estimate and show the
strengths of the correlation and direction of maximal correlation
between the nMDS configuration and the jth enviromental variable. This
technique can be used to indicate if one or more environmental variables
are associated with differences between sites/samples as represented in
the nMDS ordination.

The big caveat is the implication that the correlation or relationship
between z_j and the ordination space is linear. ordisurf() allows you to
relax this assumption as we fit a potentially non-linear surface to the
ordination space instead of the plane that envfit() effectively produces
(though we show only the direction of change with the arrow).

So without seeing your manuscript or more context (and I'm not promising
to read it or comment more if you provide it) I would suggest that, *if*
you have applied nMDS and used envfit() correctly the combined analysis
*does* reflect the *linear* relationship between the edaphic factor and
the Bray-Curtis distance, assuming of course that the nMDS has low
stress (i.e fits the original dissimilarities well).

In future, you should consider posting similar questions
(ecological/environmental) to the R-SIG-Ecology list instead of the main
R-Help list. I know Jari (lead developer of vegan and author of
envfit() ) has stopped regularly reading the main R-Help list and you
will get far more eyes familiar with these techniques on the
R-SIG-Ecology list.

I have taken the liberty of cc'ing this to the R-Sig-Ecology list so
others can comment.

HTH

G

 Thanks for any suggestions!
 
 
 Matt
 
 __
 r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [R] reception of (Vegan) envfit analysis by manuscript reviewers

2012-05-10 Thread Gavin Simpson
I've removed R-Help from this now...

On Thu, 2012-05-10 at 10:13 +, Jari Oksanen wrote:
 On 10/05/2012, at 11:45 AM, Gavin Simpson wrote:
snip /
  As you provide little or no context I'll explain what envfit() does etc.
  
  The idea goes back a long way (!) and is in my 1995 edition of Jongman
  et al Data Analysis in Community and Landscape Ecology (Cambridge
  University Press) though most likely was in 1987 version too. See
  Section 5.4 of the Ordination chapter by Ter Braak in that book.
  
  The idea is to find the direction (in the k-dimensional ordination
  space) that has maximal correlation with an external variable.
 
 
 Hello,

snip /

 Then about Bray-Curtis. The referee may be correct when writing that
 the fitted vectors are not directly related to Bray-Curtis. You fit
 the vectors to the NMDS ordination, and that is a non-linear mapping
 from Bray-Curtis to the metric ordination space.  There are two points
 here: non-linearity and stress. Because of these, it is not strictly
 about B-C. Of course, the referee is wrong when writing about NMDS
 axes: the fitted vector has nothing to do with axes (unless you rotate
 your axis parallel to the fitted vector which you can do). The NMDS is
 based on Bray-Curtis, but it is not the same, and the vector fitting
 is based on NMDS. So why not write that is about NMDS? Why to insist
 on Bray-Curtis which is only in the background?

Right, agreed. The analysis is one step removed from the B-C but the
point of doing the nMDS was to find a low-d mapping of these B-C
distances so in the sense that *if* the mapping is a good one then we
can talk about correlations between distances between sites and the
environmental variables. Whilst it might be strictly more correct to
talk about this from the point of view of the nMDS the implication is
that for significant envfit()s there is a significant linear correlation
between the environmental variable(s) and the approximate ranked
distances between samples.

I mean, if all we talk about is the nMDS who cares? it is the
implications of this for the system under study that are of interest.

That said, B-C is just one of many ways to think of distance so to my
mind I wouldn't even talk about the B-C distance either; the interest is
in differences between sites/samples. The relevance of B-C or some other
coefficient only comes in when considering if they are a good descriptor
of the distance between samples for the variables you are considering.

Cheers,

G

 Cheers, Jari Oksanen
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] [R] reception of (Vegan) envfit analysis by manuscript reviewers

2012-05-10 Thread Gavin Simpson
On Thu, 2012-05-10 at 13:17 +0200, Alan Haynes wrote:
 Hi all,
 
 Im using envfit with some decomposition data currently but with a PCA
 result (via vegan:::rda()). Is envfit still valid for PCA results? I
 guess it doesnt make so very much difference, just the interpretation
 is slightly different. 
 Or am I barking up the wrong tree by using this approach?

It is perfectly valid and is introduced in Jongman et al alongside PCA
and CA. We (well Jari) wouldn't have written a method for objects of
class cca if it wasn't appropriate.

I suggest you look at ordisurf() though; in most of the projects I have
been involved in, the linearity assumption of envfit() is questionable.

If you want a bit more info on what ordisurf() is doing see my blog post
on the function: http://wp.me/pZRQ9-1x

HTH

G

 Cheers,
 
 Alan
 
 --
 Email: aghay...@gmail.com
 Mobile: +41794385586
 Skype: aghaynes
 
 
 On 10 May 2012 12:53, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:
 I've removed R-Help from this now...
 
 On Thu, 2012-05-10 at 10:13 +, Jari Oksanen wrote:
  On 10/05/2012, at 11:45 AM, Gavin Simpson wrote:
 
 snip /
   As you provide little or no context I'll explain what
 envfit() does etc.
  
   The idea goes back a long way (!) and is in my 1995
 edition of Jongman
   et al Data Analysis in Community and Landscape Ecology
 (Cambridge
   University Press) though most likely was in 1987 version
 too. See
   Section 5.4 of the Ordination chapter by Ter Braak in that
 book.
  
   The idea is to find the direction (in the k-dimensional
 ordination
   space) that has maximal correlation with an external
 variable.
 
 
  Hello,
 
 
 snip /
 
  Then about Bray-Curtis. The referee may be correct when
 writing that
  the fitted vectors are not directly related to Bray-Curtis.
 You fit
  the vectors to the NMDS ordination, and that is a non-linear
 mapping
  from Bray-Curtis to the metric ordination space.  There are
 two points
  here: non-linearity and stress. Because of these, it is not
 strictly
  about B-C. Of course, the referee is wrong when writing
 about NMDS
  axes: the fitted vector has nothing to do with axes (unless
 you rotate
  your axis parallel to the fitted vector which you can do).
 The NMDS is
  based on Bray-Curtis, but it is not the same, and the vector
 fitting
  is based on NMDS. So why not write that is about NMDS? Why
 to insist
  on Bray-Curtis which is only in the background?
 
 
 Right, agreed. The analysis is one step removed from the B-C
 but the
 point of doing the nMDS was to find a low-d mapping of these
 B-C
 distances so in the sense that *if* the mapping is a good one
 then we
 can talk about correlations between distances between sites
 and the
 environmental variables. Whilst it might be strictly more
 correct to
 talk about this from the point of view of the nMDS the
 implication is
 that for significant envfit()s there is a significant linear
 correlation
 between the environmental variable(s) and the approximate
 ranked
 distances between samples.
 
 I mean, if all we talk about is the nMDS who cares? it is the
 implications of this for the system under study that are of
 interest.
 
 That said, B-C is just one of many ways to think of distance
 so to my
 mind I wouldn't even talk about the B-C distance either; the
 interest is
 in differences between sites/samples. The relevance of B-C or
 some other
 coefficient only comes in when considering if they are a good
 descriptor
 of the distance between samples for the variables you are
 considering.
 
 Cheers,
 
 G
 
  Cheers, Jari Oksanen
 
 
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
 %~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e]
 gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w]
 http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w]
 http://www.freshwaters.org.uk

Re: [R-sig-eco] bootstrapping gam models with multiple explanatory terms

2012-05-04 Thread Gavin Simpson
Dear Basil,

You are describing a non-parametric bootstrap procedure. There is
nothing wrong with what you are doing, but the approach you are taking
to extract/plot the smooth terms is a little too simplistic and hence
you've hit a brick wall.

To bootstrap multiple terms, I would have my training data in a data
frame, say `train`, and draw a bootstrap sample from that. I would fit
my model in the loop like this

boot.mod - gam(y ~ s(x, bs = cr) + s(z, bs = cr),
data = train[k.star, , drop = FALSE])

notice how I am not fiddling with the model representation just the data
object used to fit the model.

Then I would use `predict(, type = terms)`, not `type = link`
(the default). `type = terms` returns a matrix of contributions of
terms in the model. In the above model you'd have a matrix with 2
columns, one for the smooth on `x` and one on `z`. These are the centred
smooth functions. To relate this to the values you were producing with
`predict()` in your example note that

predterms - predict(boot.mod, newdata, type = terms)
pred - attr(predterms, constant) + rowSums(predterms)

gives object `pred` which should be equivalent to `predict(boot.mod)`.
The constant is the model intercept, which is an attribute of the
returned object.

You could then plot each column of `pred` against the relevant column
from `newdata` to give the bootstrap smooth for each term. I would do
that with `lines()` to add them to the plot.

In addition, note that you could sample form the posterior distribution
of the model to generate something similar; the splines are associated
with parameters and the model estimates for these parameters form a
multivariate normal distribution. You can take random draws from the
distribution to examine the variation in the shapes that could be taken
by the fitted splines given the uncertainty in fitting. Simon Wood has
an example of this in his GAM: an introduction with R book and I used
this in a blog post recently:

http://wp.me/pZRQ9-2j

Simon's book also has an example of a parametric bootstrap where the
resampling is done from the model residuals to create new data from the
fitted model and then fit a new model to the new data.

HTH

G

On Thu, 2012-05-03 at 20:36 -0500, Basil Iannone wrote:
 Hello R users,
 
 I may be thinking about this all wrong. If I am, please let me know.
 
 My question is about bootstrapping gam models. I want to know if there is a
 way to produce bootstrapped smoothers in a gam model that has more than one
 significant explanatory term. I know how to achieve this when the model
 only has one term. The code that I am using to do so is below. It produces
 a nice graph of y values in response to my explanatory variable (x), along
 with bootstrapped smoothers in a grey color and the smoother produced by
 the gam package in black. In my mind this is a very useful graph to help
 determine the realness of the fitted smoother that gam produces. I should
 note a thanks to Charles Geyer who had code on his STATS 5601 website that
 has helped me immensely thus far.
 
 x - SI1 ## my explanatory variable
 y - NO3.sp3 ## my dependent/response variable
 
 model - gam(y ~ s(x, bs = cr)) ## I know I can use other smoother types.
 I am still experimenting with this aspect.
 plot(x, y)
 curve(predict(model, newdata = data.frame(x = x)), add = TRUE)
 n - length(x)
 nboot - 100
 for (i in 1:nboot) {
k.star - sample(n, replace = TRUE)
x.star - x[k.star]
y.star - y[k.star]
model.star - gam(y.star ~ s(x.star, bs = cr))
curve(predict(model.star, data.frame(x.star = x)),
add = TRUE, col = grey)
 }
 points(x, y, pch = 16)
 curve(predict(model, newdata = data.frame(x = x)), add = TRUE, lwd = 2)
 
 But now lets say I have a model with two explanatory variables ( x and z)
 and for the sake of this discussion, both terms are significant (i.e., the
 data is better explained by having them in the model). The model in this
 case would be:
 
 new.model - gam(y ~ s(x, bs = cr)+ s(z, bs = cr))
 
 I know by using plot (new.model, resid = T, pch = 16) I can see the
 smoothers and 95% CI produced by the gam package with the data values
 overlayed.
 
 Is there a way to produce bootstrapped smoothers for each term from the
 same model (new.model) so that I can visualize them in the same way as the
 plot function allows me? I thought of having separate single-term models
 for each explanatory variable (Y~s(x) and y~s(z)), but doing so is not
 correct since removing one term from a model causes the parameter values of
 the other term to change. Any suggestions on the bootstrapping code that
 would produce such a graphical output would be appreciated, especially
 since I am new to the bootstrapping coding.
 
 I hope that is clear and thanks in advance for any help that anyone can
 offer.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0

Re: [R-sig-eco] Testing difference between diversity indices with vegan::oecosimu

2012-04-26 Thread Gavin Simpson
On Thu, 2012-04-26 at 00:36 -0400, David Valentim Dias wrote:
 Hello Cichini,
 
 I cannot help with your code but seems like you have a silly hypothesis.
 Think about it: Probability of two communities to be identical?
 You need to restate it in some more useful way. We already know most things
 are different but with what magnitude? Which factors are causing these
 changes? How these changes matter from the environment and us?

Surely if we knew the two things were different there would be no need
to test if they were? Most statistics assumes a Null model as we can say
something specific about the magnitude of the difference (it is zero)
and we can then see if the observations are consistent with that model.

I agree that subsequent analysis is required to understand why there are
differences, but we still need a mechanism to say, given the data
collected and the error processes, are the diversities of these two
samples the same?

G

 2012/4/25 Chris Howden ch...@trickysolutions.com.au
 
  Why not try some type of ANOVA style glm?
 
  Chris Howden
  Founding Partner
  Tricky Solutions
  Tricky Solutions 4 Tricky Problems
  Evidence Based Strategic Development, IP Commercialisation and
  Innovation, Data Analysis, Modelling and Training
 
  (mobile) 0410 689 945
  (fax / office)
  ch...@trickysolutions.com.au
 
  Disclaimer: The information in this email and any attachments to it are
  confidential and may contain legally privileged information. If you are not
  the named or intended recipient, please delete this communication and
  contact us immediately. Please note you are not authorised to copy,
  use or disclose this communication or any attachments without our
  consent. Although this email has been checked by anti-virus software,
  there is a risk that email messages may be corrupted or infected by
  viruses or other
  interferences. No responsibility is accepted for such interference. Unless
  expressly stated, the views of the writer are not those of the
  company. Tricky Solutions always does our best to provide accurate
  forecasts and analyses based on the data supplied, however it is
  possible that some important predictors were not included in the data
  sent to us. Information provided by us should not be solely relied
  upon when making decisions and clients should use their own judgement.
 
  On 26/04/2012, at 7:19, Kay Cichini kay.cich...@gmail.com wrote:
 
   Hello all,
  
   I'd like to test if total diversity differs between two communities. For
   each community several samples were taken and abundances collapsed over
   groups to compute total diversity for each group. I tried to use
   vegan::oecosimu to test non-randomness of my statisitc (difference in
   Simpson-Diversity indices of collapsed abundances) - however, I am not
   quite sure if I oversee posssible pitfalls:
  
   library(vegan)
   data(dune)
  
   # a grouping variable:
   gr - gl(2, nrow(dune)/2)
  
   divdiff - function(x) abs(diversity(colSums(x[gr == 1, ]), simp) -
 diversity(colSums(x[gr == 2, ]), simp))
   # testing function:
   divdiff(dune)
  
   oecosimu(dune, divdiff, r2dtable, nsimul = 1999)
   # oecosimu with 1999 simulations
   # simulation method r2dtable
   # alternative hypothesis: true mean is not equal to the statistic
   #   statisticz 2.5%  50% 97.5% Pr(sim.)
   # statistic   0.00275 -0.20996  0.00013  0.00280  0.01 0.98
  
  [[alternative HTML version deleted]]
  
   ___
   R-sig-ecology mailing list
   R-sig-ecology@r-project.org
   https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Testing difference between diversity indices with vegan::oecosimu

2012-04-26 Thread Gavin Simpson
   Tricky Solutions
   Tricky Solutions 4 Tricky Problems
   Evidence Based Strategic Development, IP Commercialisation and
   Innovation, Data Analysis, Modelling and Training
  
   (mobile) 0410 689 945
   (fax / office)
   ch...@trickysolutions.com.au
  
   Disclaimer: The information in this email and any attachments to it
   are confidential and may contain legally privileged information. If
   you are not the named or intended recipient, please delete this
   communication and contact us immediately. Please note you are not
   authorised to copy, use or disclose this communication or any
   attachments without our consent. Although this email has been checked
   by anti-virus software, there is a risk that email messages may be
   corrupted or infected by viruses or other interferences. No
   responsibility is accepted for such interference. Unless expressly
   stated, the views of the writer are not those of the company. Tricky
   Solutions always does our best to provide accurate forecasts and
   analyses based on the data supplied, however it is possible that some
   important predictors were not included in the data sent to us.
   Information provided by us should not be solely relied upon when
   making decisions and clients should use their own judgement.
  
   On 26/04/2012, at 7:19, Kay Cichini kay.cich...@gmail.com wrote:
  
Hello all,
   
I'd like to test if total diversity differs between two communities.
For each community several samples were taken and abundances
collapsed over groups to compute total diversity for each group. I
tried to use vegan::oecosimu to test non-randomness of my statisitc
(difference in Simpson-Diversity indices of collapsed abundances) -
however, I am not quite sure if I oversee posssible pitfalls:
   
library(vegan)
data(dune)
   
# a grouping variable:
gr - gl(2, nrow(dune)/2)
   
divdiff - function(x) abs(diversity(colSums(x[gr == 1, ]), simp)
  -
  diversity(colSums(x[gr == 2, ]),
simp)) # testing function:
divdiff(dune)
   
oecosimu(dune, divdiff, r2dtable, nsimul = 1999) # oecosimu with
1999 simulations # simulation method r2dtable # alternative
hypothesis: true mean is not equal to the statistic
#   statisticz 2.5%  50% 97.5% Pr(sim.)
# statistic   0.00275 -0.20996  0.00013  0.00280  0.01 0.98
   
   [[alternative HTML version deleted]]
   
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
   ___
   R-sig-ecology mailing list
   R-sig-ecology@r-project.org
   https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
 
 
 
  --
  Currmculo: http://lattes.cnpq.br/7541377569511492
 
 [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] COURSE: Analysing Palaeolimnological Data with R

2012-04-08 Thread Gavin Simpson
Dear List,

Analysing Palaeolimnological Data with R

16th - 20th August 2012
University Marine Biological Station Millport, Isle of Cumbrae, Scotland

Steve Juggins and I will be running a 4-day residential R course to
coincide with the International Paleolimnology Symposium this August.
The course will be held at the University Marine Biological Station
Millport on the Isle of Cumbrae, within easy reach of Glasgow and the
IPS2012 venue.

The course costs £300 + VAT and includes food, accommodation and course
materials. We have support from PAGES to assist with costs for young
researchers from developing countries (see below). Please note that you
do *not* need to register for IPS2012 to attend the R course; it is open
to anyone.

The course will cover many topics of particular interest to
palaeolimnologists and palaeoecologists. Details can be found on the
course website:

http://www.staff.ncl.ac.uk/staff/stephen.juggins/courses/PalaeoDataWithR.htm

Full course details can also be downloaded as a PDF:

http://www.staff.ncl.ac.uk/staff/stephen.juggins/courses/PalaeoData_With_R_Course.pdf

To register for the course, please visit:

http://webstore.ncl.ac.uk/browse/extra_info.asp?compid=1modid=5prodid=85deptid=24catid=108CourseDate=121

** Support for young researchers from developing countries **
Thanks to the generous support of PAGES we are able to cover travel,
subsistence and course costs (up to £1100) for five young researchers
from developing countries. If you would like to apply for PAGES
financial support please send a CV and short covering letter outlining
your research interests and why the course will benefit you to both
Gavin and Steve (our contact details are on the course website). The
deadline for applying for PAGES support is 15th May 2012.

If you have any further questions, please do not hesitate to contact me.

We look forward to seeing you in August.

Gavin
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] MAFA function

2012-04-03 Thread Gavin Simpson
See:

http://dx.doi.org/10.1051/alr/2009020

There is R code in the supplementary materials.

HTH

G

On Tue, 2012-04-03 at 10:50 -0400, Katherine Mills wrote:
 Could any of you point me towards a package that supports a min/max
 autocorrelation factor analysis in R?  Or has anyone coded a function for
 this analysis that you would be willing to share?
 
 Thanks,
 Kathy Mills
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] chronological clustering and dynamic factor analysis

2012-03-12 Thread Gavin Simpson
On Sun, 2012-03-11 at 22:30 -0500, Eugenia Bragina wrote:
 Dear R people,
 Does anybody know of any R functions to perform dynamic factor
 analysis and chronological clustering? Couple of years ago there was
 no R command for CC and DFA, but may be somebody wrote it?
 Many thanks in advance!
 Eugenia

Dear Eugenia,

For chronological clustering there are a couple of options. First is the
rioja package of Steve Juggins (on CRAN) which has a couple of types of
clustering available - see `chclust()`. A second option is to use a
multivariate tree, with the single constraint being the time variable.
For example, using a multivariate tree, via package mvpart, clusters are
found by minimising a within group sums of squares with the constraint
that samples are maintained in temporal order.

I'm not sure of your background/area of interest, but the type of
clustering that mvpart can give you has long been known in
palaeoecological circles (in fact even before the CART book came out).
If you would like to know more about the mvpart way of doing things,
contact me off-list and I can send you a book chapter I have written
that includes an example. There is accompanying code available too (as R
scripts, not inline in the chapter).

As for DFA, see the MARSS package:

http://cran.r-project.org/web/packages/MARSS/index.html

DFA is one of the cases of the models fitted by the package. I haven't
got round to using it myself, but it didn't look like something for the
faint hearted when I took a cursory look 6 months ago. However IIRC
there is a good user guide (Vignette) provided with the package which
includes a DFA example.

HTH

G

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Co-correspondence analysis

2012-03-01 Thread Gavin Simpson
On Wed, 2012-02-29 at 18:44 +0100, Duarte Viana wrote:
 Dear all,
 
 I want relate two matrices of species presence-absence data,
 preferably in a predictive manner. Is it reasonable to do it by
 co-correspondence analysis using the coca function in the
 cocorresp package? Some papers seem to use presence-absence data,
 but the help document for this function only refers to abundance data.
 
 Any insights in this would be greatly appreciated.
 
 Duarte

As CoCA is based very much on the ideas of weighted averages and
correspondence analysis (CA) and that CA can be used on presence absence
data, yes, presence absence data can be analysed using CoCA.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] non-metric MDS comparison - vegan vs. ecodist packages

2012-02-20 Thread Gavin Simpson
On Mon, 2012-02-20 at 16:28 +0100, Gian Maria Niccolò Benucci wrote:
 Hi r-sig-ecology Members,
 
 I would please you to examine the differences I found in metaMDS() and
 nmds() outputs and possibly draw some conclusions. As I wrote in past
 emails I have 24 samples of ectomycorrhizal fungi grouped into two
 different communities (12 natural and 12 cultivated).
 
 here is the code:
 
  metaMDS(data_matrix2, distance=bray, k=2, autotransform=F) - metaMDS
  metaMDS
 
 Call:
 metaMDS(comm = data_matrix2, distance = bray, k = 2, autotransform = F)
 
 global Multidimensional Scaling using monoMDS
 
 Data: data_matrix2
 Distance: bray
 
 Dimensions: 2
 *Stress: 0.04536661 *
 Stress type 1, weak ties
 No convergent solutions - best solution after 20 tries

Perhaps you could try running metaMDS with random starts until it does
find convergent solutions?! Why expect metaMDS() to do better in 20
tries than nmds() with 100 random tries?

add `trymax = 100` and `halfchange = FALSE` to try to make the two
functions more comparable.

HTH

G

 Scaling: centring, PC rotation, halfchange scaling
 Species: expanded scores based on data_matrix2
 
  nmds - nmds(dist_bray, mindim = 2, maxdim = 2, nits = 100)
 Using random start configuration
 Using random start configuration
 ...
  nmds_min
 X1  X2
 1  -0.66226262  0.16392824
 2  -0.68844987 -0.20891993
 3  -0.47717515  0.55323693
 4  -0.67213392 -0.09195847
 5   0.03305172  0.41844232
 6   0.31992006  0.53887910
 7  -0.61944875  0.34411146
 8   0.07013849  0.65723057
 9   0.53053436  0.37444633
 10  0.26704705  0.39849341
 11 -0.20981416  0.70391983
 12 -0.61620355 -0.43568743
 13  0.14219660 -0.69571194
 14  0.42365981 -0.43308767
 15  0.15840330 -0.12872863
 16 -0.00492407 -0.06889680
 17 -0.20513513 -0.58052156
 18  0.46546214  0.12019775
 19  0.36753709 -0.25048256
 20 -0.09446486 -0.61943616
 21  0.15010553 -0.31995009
 22  0.41540872 -0.29507200
 23  0.42573367 -0.12321584
 24  0.48081354 -0.02121688
  min(nmds$stress)
 [*1] 0.2787161*
  nmds$r2[which.min(nmds$stress)]
 [1] 0.6338372
 
 Is it possible I got so different stress values??
 
 Thanks for replying,
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] non-metric MDS comparison - vegan vs. ecodist packages

2012-02-20 Thread Gavin Simpson
On Mon, 2012-02-20 at 17:24 +0100, Gian Maria Niccolò Benucci wrote:
 Dear Gavin,
 
 Thank you very much. I tried with your advice but the result is almost
 unchanged...

 Any further advice??

Yeah, what Sarah said. Read the documentation; the two functions use
different definitions of stress.

You could always do a procrustes rotation on the two configurations to
see how well they compare.

G

 Gian
 
 
 
 
 2012/2/20 Gavin Simpson gavin.simp...@ucl.ac.uk
 
  On Mon, 2012-02-20 at 16:28 +0100, Gian Maria Niccol Benucci wrote:
   Hi r-sig-ecology Members,
  
   I would please you to examine the differences I found in metaMDS() and
   nmds() outputs and possibly draw some conclusions. As I wrote in past
   emails I have 24 samples of ectomycorrhizal fungi grouped into two
   different communities (12 natural and 12 cultivated).
  
   here is the code:
  
metaMDS(data_matrix2, distance=bray, k=2, autotransform=F) - metaMDS
metaMDS
  
   Call:
   metaMDS(comm = data_matrix2, distance = bray, k = 2, autotransform = F)
  
   global Multidimensional Scaling using monoMDS
  
   Data: data_matrix2
   Distance: bray
  
   Dimensions: 2
   *Stress: 0.04536661 *
   Stress type 1, weak ties
   No convergent solutions - best solution after 20 tries
 
  Perhaps you could try running metaMDS with random starts until it does
  find convergent solutions?! Why expect metaMDS() to do better in 20
  tries than nmds() with 100 random tries?
 
  add `trymax = 100` and `halfchange = FALSE` to try to make the two
  functions more comparable.
 
  HTH
 
  G
 
   Scaling: centring, PC rotation, halfchange scaling
   Species: expanded scores based on data_matrix2
  
nmds - nmds(dist_bray, mindim = 2, maxdim = 2, nits = 100)
   Using random start configuration
   Using random start configuration
   ...
nmds_min
   X1  X2
   1  -0.66226262  0.16392824
   2 -0.68844987 -0.20891993
   3 -0.47717515  0.55323693
   4 -0.67213392 -0.09195847
   5 0.03305172  0.41844232
   6   0.31992006  0.53887910
   7  -0.61944875  0.34411146
   8 0.07013849  0.65723057
   9 0.53053436  0.37444633
   10  0.26704705  0.39849341
   11 -0.20981416  0.70391983
   12 -0.61620355 -0.43568743
   13 0.14219660 -0.69571194
   14 0.42365981 -0.43308767
   15 0.15840330 -0.12872863
   16 -0.00492407 -0.06889680
   17 -0.20513513 -0.58052156
   18 0.46546214  0.12019775
   19 0.36753709 -0.25048256
   20 -0.09446486 -0.61943616
   21  0.15010553 -0.31995009
   22  0.41540872 -0.29507200
   23  0.42573367 -0.12321584
   24  0.48081354 -0.02121688
min(nmds$stress)
   [*1] 0.2787161*
nmds$r2[which.min(nmds$stress)]
   [1] 0.6338372
  
   Is it possible I got so different stress values??
  
   Thanks for replying,
  
   ___
   R-sig-ecology mailing list
   R-sig-ecology@r-project.org
   https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 
 
 
 
 
 *- Do not print this email unless you really need to. Save paper and
 protect the environment! -*
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] non-metric MDS comparison - vegan vs. ecodist packages

2012-02-20 Thread Gavin Simpson
On Mon, 2012-02-20 at 14:30 -0500, Sarah Goslee wrote:
 References are given in the help for nmds. In particular, nmds() uses
 Kruskal's formulation.

And you can read the code. Look at nmds, at the top of the function body
is an inline function sstress() which does the relevant computation,
IIUC.

G

 Sarah
 On Feb 20, 2012 1:48 PM, Gian Maria Niccol Benucci 
 gian.benu...@gmail.com wrote:
 
  Dear Gavin and Sarah,
 
  I've found how metaMDS() calculate the stress [using monoMDS() of vegan],
  but I do not find how nmds() of ecodist do it so. Could you please link me
  the page or where I can find it, maybe it escapes me...
  Thank you very much for your aid,
 
  Gian
 
 
  2012/2/20 Gavin Simpson gavin.simp...@ucl.ac.uk
 
   On Mon, 2012-02-20 at 17:24 +0100, Gian Maria Niccol Benucci wrote:
Dear Gavin,
   
Thank you very much. I tried with your advice but the result is almost
unchanged...
  
Any further advice??
  
   Yeah, what Sarah said. Read the documentation; the two functions use
   different definitions of stress.
  
   You could always do a procrustes rotation on the two configurations to
   see how well they compare.
  
   G
  
Gian
   
   
   
   
2012/2/20 Gavin Simpson gavin.simp...@ucl.ac.uk
   
 On Mon, 2012-02-20 at 16:28 +0100, Gian Maria Niccol Benucci wrote:
  Hi r-sig-ecology Members,
 
  I would please you to examine the differences I found in metaMDS()
   and
  nmds() outputs and possibly draw some conclusions. As I wrote in
  past
  emails I have 24 samples of ectomycorrhizal fungi grouped into two
  different communities (12 natural and 12 cultivated).
 
  here is the code:
 
   metaMDS(data_matrix2, distance=bray, k=2, autotransform=F) -
   metaMDS
   metaMDS
 
  Call:
  metaMDS(comm = data_matrix2, distance = bray, k = 2,
  autotransform
   = F)
 
  global Multidimensional Scaling using monoMDS
 
  Data: data_matrix2
  Distance: bray
 
  Dimensions: 2
  *Stress: 0.04536661 *
  Stress type 1, weak ties
  No convergent solutions - best solution after 20 tries

 Perhaps you could try running metaMDS with random starts until it
  does
 find convergent solutions?! Why expect metaMDS() to do better in 20
 tries than nmds() with 100 random tries?

 add `trymax = 100` and `halfchange = FALSE` to try to make the two
 functions more comparable.

 HTH

 G

  Scaling: centring, PC rotation, halfchange scaling
  Species: expanded scores based on data_matrix2
 
   nmds - nmds(dist_bray, mindim = 2, maxdim = 2, nits = 100)
  Using random start configuration
  Using random start configuration
  ...
   nmds_min
  X1  X2
  1  -0.66226262  0.16392824
  2 -0.68844987 -0.20891993
  3 -0.47717515  0.55323693
  4 -0.67213392 -0.09195847
  5 0.03305172  0.41844232
  6   0.31992006  0.53887910
  7  -0.61944875  0.34411146
  8 0.07013849  0.65723057
  9 0.53053436  0.37444633
  10  0.26704705  0.39849341
  11 -0.20981416  0.70391983
  12 -0.61620355 -0.43568743
  13 0.14219660 -0.69571194
  14 0.42365981 -0.43308767
  15 0.15840330 -0.12872863
  16 -0.00492407 -0.06889680
  17 -0.20513513 -0.58052156
  18 0.46546214  0.12019775
  19 0.36753709 -0.25048256
  20 -0.09446486 -0.61943616
  21  0.15010553 -0.31995009
  22  0.41540872 -0.29507200
  23  0.42573367 -0.12321584
  24  0.48081354 -0.02121688
   min(nmds$stress)
  [*1] 0.2787161*
   nmds$r2[which.min(nmds$stress)]
  [1] 0.6338372
 
  Is it possible I got so different stress values??
 
  Thanks for replying,
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



   
   
   
*- Do not print this email unless you really need to. Save paper
  and
protect the environment! -*
   
  [[alternative HTML version deleted]]
   
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
   --
   %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL

Re: [R-sig-eco] singletons and doubletons in vegan

2012-02-01 Thread Gavin Simpson
On Wed, 2012-02-01 at 07:32 -0500, Canning-Clode, Joao wrote:
 Hi all,
 
 I am running species accumulation curves for several data-sets, and am
 also playing with some estimators with this function in vegan:
 
 SF.estimators-poolaccum(SF.data,permutations=1000)
 SF.estimators # shows data for Sobs, Chao, both Jacks and bootstrap
 plot(SF.estimators) # illustrates graphs for all estimators
 summary(SF.estimators,display=jack2) # gives jack2 stats
 
 Does any of you know how can I get the info about singletons and doubletons?

What info do you want?

Working out which spp are singletons and doubletons from the input data
is quite simple for the entire pool of samples (he says, hoping I i]
understood and ii] got this right ;-):

cs - colSums(SF.data  0)
singlet - cs == 1L
doublet - cs == 2L

singlet[which(singlet)]
doublet[which(doublet)]

E.g.

## Load example data set from vegan
data(BCI)

## compute the column sums
cs - colSums(BCI  0)

## logicals for singleton and doubleton
singlet - cs == 1L
doublet - cs == 2L

## which species are...
singlet[which(singlet)]
doublet[which(doublet)]

## number of singletons and doubletons...
sum(singlet)
sum(doublet)

HTH

G

 best wishes
 
 João
 
 João Canning Clode, Ph.D
 Research Associate
 Smithsonian Environmental Research Center
 647 Contees Wharf Road
 Edgewater, MD 21037
 
 Email: canning-clo...@si.edumailto:canning-clo...@si.edu
 Web: www.canning-clode.comhttp://www.canning-clode.com
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Permutation: habitat diversity

2011-12-23 Thread Gavin Simpson
On Fri, 2011-12-23 at 18:41 +, Martin Wilkes wrote:
 We have counts of habitat types at three sites and want to permute the
 sets of counts between sites to get a distribution of habitat (H)
 diversity. We have tried several R functions (perm, sample, permtest)
 but none seems to do what we want.  We're happy writing the code to
 calculate the H distribution but need some help getting the
 permutations right if possible please.

Hi Martin,

Its not immediately clear to me what your data looks like nor what needs
to be shuffled and what doesn't. Could you expand, perhaps with a
simplified example or structure of data and how you want to shuffle it?

In the meantime, take a look at the permute package as it may do what
you want. If it doesn't I'd be interested in knowing how you want to
permute your data as it might be easy to add this sort of thing to
permute at a later date.

All the best,

Gavin

 Thanks
 
 Martin Wilkes
 University of Worcester
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] envfit calculation

2011-12-20 Thread Gavin Simpson
On Mon, 2011-12-05 at 09:39 -0800, dan clune wrote:
 Hello R-Ecologists,
 I'm looking for clarification on how vectors in function envfit are
 calculated. I understand how to use the function, it is pretty
 straightforward. But after reading the documentation I am still
 wondering exactly how the vectors are calculated internally by the
 function. 

It is covered in Jongman, ter Braak  van Tongeren (1995) Data Analysis
in Community and Landscape Ecology. Cambridge University Press. It is in
the ordination chapter. I'm away from my copy at the moment so can't
point you to it directly I'm afraid.

However, that might not help you understand the R code as IIRC Jari has
some pretty efficient code to do fitting. If that was your aim, it might
help to speak to Jari on the vegan help or discussion lists on
r-forge.r-project.org.

I've been meaning to write a blog post about envfit for a wee while, but
that would only be from the point of view of what the method/idea is
doing, not from understanding the codes Jari used to implement it.

HTH

G

 Can anyone point me toward a more detailed explanation of the calculations?
 Thanks for any help,
 Dan
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Problem with a loop

2011-11-28 Thread Gavin Simpson
Hi Momadou,

This is *not* a general help list for R. It is a SIG for ecological and
environment-related questions.

Your question would be suitable for the R-SIG-Geo list or the general
R-Help mailing list, but not here.

Gavin

On Mon, 2011-11-28 at 15:35 +, momadou sow wrote:
 Hi,
 I have a correct
 code to extract pixel value of my image and to stock the result in a folder. 
 See
 correct code in attached file. But as I have several images, I would like to 
 do a loop to
 extract all pixel values at the same time and stock all the results  in 
 different matrices (Mat1, Mat2, Mat3, etc.) in the same folder.
 I try this
 code but it is not correct. 
 library(raster)
 a - NULL
 for(i in 2002273:2002361){ 
 a-(adresse(C:\\Users\\Momadou\\Desktop\\Subset\\Subset_Site1_NDII\\NDII_Subset_2002\\Sahel_Site1_,i,sep=))
  
 RasterToColonne - function(AdresseRaster){ # to transform raster to Colum
 a - raster(adresse)
 CentreRaster - rasterToPoints(a)
 PixelColonne - as.data.frame(CentreRaster)
 PixelColonne - cbind(PixelColonne,coordonnee =
 paste(PixelColonne[,1],PixelColonne[,2],sep= ))
 colnames(PixelColonne) -
 c(Longitude,Latitude,ValeurPixel,Coordonnees)
 return(PixelColonne)
 }
 Mat[i]- RasterToColonne(adresse)
 write.table(Mat[i], C:\\Users\\Momadou\\Desktop\\Mat[i].txt,sep= )
 Can you
 help me to solve this problem?
 Thank in
 advance 
 Mamadou 
 ___ R-sig-ecology mailing list 
 R-sig-ecology@r-project.org 
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] interpreting adonis results

2011-11-25 Thread Gavin Simpson
On Sun, 2011-11-20 at 02:24 +0100, Gian Maria Niccolò Benucci wrote:
 Gavin,
 
 May I ask you if you have tutorial for the betadisper() function?

Nothing more than that which comes with vegan in the help or is
contained in the Vegan tutorial, which can be accessed via:

http://vegan.r-forge.r-project.org/

 Maybe I did a mistake during the identification of the groups of samples...

Yes, I think you did...

 The I tried to use the betadisper( ) function R gave me an error...
 ***
  groups - factor(c(rep(1,12),rep(13,24),
 rep(25,36),rep(37,48),rep(49,60),rep(61,72),rep(73,84),+
 labels=c(SIG,BA,SIMa,SIMb,SZV,BM,BRE)))
 
  betadiv - betadisper(dist1, groups)
 
 Error: (subscript) logical subscript too long
 
 ***
 I can't understand what is wrong...
 Thank you help and patience,

...your code for defining groups is fundamentally flawed; you are
including the labels vector inside the c() statement and so R thinks
these are just some more elements of the vector:

groups - factor(c(rep(1,12), rep(13,24), rep(25,36), rep(37,48), 
   rep(49,60), rep(61,72), rep(73,84), rep(25,36), 
   rep(37,48), rep(49,60), rep(61,72), rep(73,84),
 labels=c(SIG,BA,SIMa,SIMb,SZV,BM,BRE)))

 tail(groups)
labels2 labels3 labels4 labels5 labels6 labels7 
 BASIMaSIMb SZV  BM BRE 
Levels: 1 13 25 37 49 61 73 BA BM BRE SIG SIMa SIMb SZV
 nlevels(groups)
[1] 14

I think you want this:

groups - factor(c(rep(1,12), rep(13,24), rep(25,36), rep(37,48), 
   rep(49,60), rep(61,72), rep(73,84), rep(25,36), 
   rep(37,48), rep(49,60), rep(61,72), rep(73,84)),
 labels=c(SIG,BA,SIMa,SIMb,SZV,BM,BRE))

 levels(groups)
[1] SIG  BA   SIMa SIMb SZV  BM   BRE

Do check that code you use creates the expected results!

HTH

G

 Gian
 
 
 
 
 
 
 
 2011/11/17 Gavin Simpson gavin.simp...@ucl.ac.uk
 
  On Thu, 2011-11-17 at 10:01 +0100, gabriel singer wrote:
   ... dangerous wording, there could in fact be a location effect of
   'location' and/or a dispersion effect of 'location'.
 
  Well of course, but I was assuming that the assumptions of the test were
  met! ;-)
 
   Gian, I suggest you add a test of a dispersion effect using the function
   betadisper(), then you know a bit more about the type of effect.
 
  Indeed, and as the author of that function I too would suggest that the
  homogeneity assumption be tested.
 
  G
 
   gabriel
  
   On 11/16/11 11:02 PM, Gavin Simpson wrote:
On Wed, 2011-11-16 at 03:43 +0100, Gian Maria Niccol Benucci wrote:
Hi all,
   
I had 84 samples collected in 7 different sites.
In each sample were individuated the different fungal species and
  recorded.
I would test if exist a real difference between the sites and if
  exist a
sort of site effect that structure the fungal communities...
Then, I did adonis test
   
adonis(community.sq ~ location, data=env.table, permutations=999)
Call:
adonis(formula = community.sq ~ location, data = env.table,
  permutations =
999)
   
   Df SumsOfSqs MeanSqs F.Model  R2 Pr(F)
location   612.593 2.09886  6.8867 0.34922  0.001 ***
Residuals 7723.467 0.30477 0.65078
Total 8336.060 1.0
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
   
   
   
The significance is  R2=0.349 at P=0.001
Can I assure that exist a strong site effect in structuring the
  communities
in each site?
Depends. The test is one of no effect of `location`. You have found
evidence against this hypothesis and thus could reject this hypothesis,
instead accepting the alternative hypothesis that there is an effect of
`location`. As to the strength of this effect? ~35% of the sums of
squares can be explained by `location`. Substantially more of the
variance remains unexplained. As I know nothing about your subject
  area,
I am unable to comment further on the strength of the relationship.
   
Seeing as many ecologists whose work I read would say an effect is
significant if the p-value was= 0.05. Not that I subscribe to this way
or working, but by that criterion, you have identified a significant
`location` effect.
   
HTH
   
G
   
Thanks for helping,
   
G.
   
   [[alternative HTML version deleted]]
   
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
 
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk

Re: [R-sig-eco] interpreting adonis results

2011-11-25 Thread Gavin Simpson
On Fri, 2011-11-25 at 11:41 +0200, Jari Oksanen wrote:
snip /
 Howdy Folks,
 
 It seems that empty groups (count is zero) are the source of the
 problem. You can drop them using factor() command for your interaction
 -- this drops empty levels:
 
 mod - betadisper(betad, factor(groupA))
 
 works for me.

droplevels() would be what the cool kids would use, just to be hip ;-)
It's new to recent versions of R (= 2.13.0 IIRC). For factors it is
just a roundabout way of doing `factor(myfac)`:

 droplevels.factor
function (x, ...) 
factor(x)
environment: namespace:base

but there is a data.frame method which drops empty levels for all
factors in a data frame, and package authors can write other methods
which might become useful in various places in the future. So good to
know that `droplevels()` exists.

 However, like Gav said, this may not be a meaningful test (but still
 calls for a fix in vegan).

I'll take a look at adding a test/fix to vegan.

G

  If I remember correctly, Marti Anderson had
 interactions in her software, but that was dropped from current
 versions. 
 
 Cheers, jari oksanen
 
   For the dune dataset, the above error occured. But for my data, the 
   interaction works. I wonder if i continue to use the new factor generated 
   from the factor-interaction for the betadisper, will it affects the 
   results? If this is wrong, what would be the recommended function to use?
  
  I would need to check more carefully, but this may not be what Marti
  Anderson's software would fit as an interaction model. I may be
  misremembering, but IIRC PERMDISP can handle two-way ANOVA type models
  within her framework and betadisper is not set-up for that. What you
  have done is look at whether certain combinations of your two factors of
  interest are more variable/dispersed than others. Which seems a
  reasonable hypothesis to me.
  
  G
  
   Sincerely Yours,
   J
   
   On Nov 17 2011, Kay Cecil Cichini wrote:
   
   ..to be save I would consider to exclude an effect due to different  
   multivariate spread. See chapter 5.2, Homogeneity of groups and beta  
   diversity, in the vegan tutorial at  
   http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf.
   
   best,
   kay
   
   
   Zitat von Gavin Simpson gavin.simp...@ucl.ac.uk:
   
On Wed, 2011-11-16 at 03:43 +0100, Gian Maria Niccolò Benucci wrote:
Hi all,
   
I had 84 samples collected in 7 different sites. In each sample were 
individuated the different fungal species and recorded. I would test 
if 
exist a real difference between the sites and if exist a sort of site 
effect that structure the fungal communities... Then, I did adonis 
test
   
 adonis(community.sq ~ location, data=env.table, permutations=999)
   
Call: adonis(formula = community.sq ~ location, data = env.table, 
permutations = 999)
   
  Df SumsOfSqs MeanSqs F.Model  R2 Pr(F)
location   612.593 2.09886  6.8867 0.34922  0.001 ***
Residuals 7723.467 0.30477 0.65078
Total 8336.060 1.0
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
   
   
   
The significance is R2=0.349 at P=0.001 Can I assure that exist a 
strong site effect in structuring the communities in each site?
   
Depends. The test is one of no effect of `location`. You have found
evidence against this hypothesis and thus could reject this hypothesis,
instead accepting the alternative hypothesis that there is an effect of
`location`. As to the strength of this effect? ~35% of the sums of
squares can be explained by `location`. Substantially more of the
variance remains unexplained. As I know nothing about your subject 
area,
I am unable to comment further on the strength of the relationship.
   
Seeing as many ecologists whose work I read would say an effect is
significant if the p-value was = 0.05. Not that I subscribe to this 
way
or working, but by that criterion, you have identified a significant
`location` effect.
   
HTH
   
G
   
   
Thanks for helping,
   
G.
   
  [[alternative HTML version deleted]]
   
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
   
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch

Re: [R-sig-eco] interpreting adonis results

2011-11-17 Thread Gavin Simpson
On Thu, 2011-11-17 at 10:01 +0100, gabriel singer wrote:
 ... dangerous wording, there could in fact be a location effect of 
 'location' and/or a dispersion effect of 'location'.

Well of course, but I was assuming that the assumptions of the test were
met! ;-)

 Gian, I suggest you add a test of a dispersion effect using the function 
 betadisper(), then you know a bit more about the type of effect.

Indeed, and as the author of that function I too would suggest that the
homogeneity assumption be tested.

G

 gabriel
 
 On 11/16/11 11:02 PM, Gavin Simpson wrote:
  On Wed, 2011-11-16 at 03:43 +0100, Gian Maria Niccolò Benucci wrote:
  Hi all,
 
  I had 84 samples collected in 7 different sites.
  In each sample were individuated the different fungal species and recorded.
  I would test if exist a real difference between the sites and if exist a
  sort of site effect that structure the fungal communities...
  Then, I did adonis test
 
  adonis(community.sq ~ location, data=env.table, permutations=999)
  Call:
  adonis(formula = community.sq ~ location, data = env.table, permutations =
  999)
 
 Df SumsOfSqs MeanSqs F.Model  R2 Pr(F)
  location   612.593 2.09886  6.8867 0.34922  0.001 ***
  Residuals 7723.467 0.30477 0.65078
  Total 8336.060 1.0
  ---
  Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 
 
  The significance is  R2=0.349 at P=0.001
  Can I assure that exist a strong site effect in structuring the communities
  in each site?
  Depends. The test is one of no effect of `location`. You have found
  evidence against this hypothesis and thus could reject this hypothesis,
  instead accepting the alternative hypothesis that there is an effect of
  `location`. As to the strength of this effect? ~35% of the sums of
  squares can be explained by `location`. Substantially more of the
  variance remains unexplained. As I know nothing about your subject area,
  I am unable to comment further on the strength of the relationship.
 
  Seeing as many ecologists whose work I read would say an effect is
  significant if the p-value was= 0.05. Not that I subscribe to this way
  or working, but by that criterion, you have identified a significant
  `location` effect.
 
  HTH
 
  G
 
  Thanks for helping,
 
  G.
 
 [[alternative HTML version deleted]]
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] interpreting adonis results

2011-11-16 Thread Gavin Simpson
On Wed, 2011-11-16 at 03:43 +0100, Gian Maria Niccolò Benucci wrote:
 Hi all,
 
 I had 84 samples collected in 7 different sites.
 In each sample were individuated the different fungal species and recorded.
 I would test if exist a real difference between the sites and if exist a
 sort of site effect that structure the fungal communities...
 Then, I did adonis test
 
  adonis(community.sq ~ location, data=env.table, permutations=999)
 
 Call:
 adonis(formula = community.sq ~ location, data = env.table, permutations =
 999)
 
   Df SumsOfSqs MeanSqs F.Model  R2 Pr(F)
 location   612.593 2.09886  6.8867 0.34922  0.001 ***
 Residuals 7723.467 0.30477 0.65078
 Total 8336.060 1.0
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 
 
 The significance is  R2=0.349 at P=0.001
 Can I assure that exist a strong site effect in structuring the communities
 in each site?

Depends. The test is one of no effect of `location`. You have found
evidence against this hypothesis and thus could reject this hypothesis,
instead accepting the alternative hypothesis that there is an effect of
`location`. As to the strength of this effect? ~35% of the sums of
squares can be explained by `location`. Substantially more of the
variance remains unexplained. As I know nothing about your subject area,
I am unable to comment further on the strength of the relationship.

Seeing as many ecologists whose work I read would say an effect is
significant if the p-value was = 0.05. Not that I subscribe to this way
or working, but by that criterion, you have identified a significant
`location` effect.

HTH

G

 
 Thanks for helping,
 
 G.
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] aov vs. glm

2011-11-11 Thread Gavin Simpson
On Thu, 2011-11-10 at 18:22 -0500, Lara R. Appleby 04 wrote:
 I'm trying to basically do a two way ANOVA on the dependent variable 
 (clutchsize)  
 with the two independent variables (treatment and species). It seems that 
 there  
 are three ways I can say this in R:
 
 1. glm(clutchsize~treatment*species)
 2. aov(clutchsize~treatment*species)
 3. anova(lm(clutchsize~treatment*species)
 
 Methods 2 and 3 yield equivalent results, but Method 1 is completely 
 different!  

I don't believe you; care to provide supporting evidence?

Here is a counter example that shows that these are

## From Venables and Ripley (2002) p.165.
data(npk, package=MASS)
 
## Set orthogonal contrasts.
op - options(contrasts=c(contr.helmert, contr.poly))

npk.glm - glm(yield ~ block + N*P*K, data = npk)
npk.aov - aov(yield ~ block + N*P*K, data = npk)
npk.lm  - lm(yield ~ block + N*P*K, data = npk)
anova(npk.glm, test = F)
anova(npk.lm)
summary(npk.aov)
options(op)

All three give the same results:

 anova(npk.glm, test = F)
Analysis of Deviance Table

Model: gaussian, link: identity

Response: yield

Terms added sequentially (first to last)


  Df Deviance Resid. Df Resid. Dev   F   Pr(F)   
NULL 23 876.37
block  5   343.3018 533.07  4.4467 0.015939 * 
N  1   189.2817 343.79 12.2587 0.004372 **
P  1 8.4016 335.39  0.5441 0.474904   
K  195.2015 240.19  6.1657 0.028795 * 
N:P121.2814 218.90  1.3783 0.263165   
N:K133.1413 185.77  2.1460 0.168648   
P:K1 0.4812 185.29  0.0312 0.862752   
N:P:K  0 0.0012 185.29
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 anova(npk.lm)
Analysis of Variance Table

Response: yield
  Df Sum Sq Mean Sq F value   Pr(F)   
block  5 343.29  68.659  4.4467 0.015939 * 
N  1 189.28 189.282 12.2587 0.004372 **
P  1   8.40   8.402  0.5441 0.474904   
K  1  95.20  95.202  6.1657 0.028795 * 
N:P1  21.28  21.282  1.3783 0.263165   
N:K1  33.14  33.135  2.1460 0.168648   
P:K1   0.48   0.482  0.0312 0.862752   
Residuals 12 185.29  15.441
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 summary(npk.aov)
Df Sum Sq Mean Sq F value   Pr(F)   
block5 343.29  68.659  4.4467 0.015939 * 
N1 189.28 189.282 12.2587 0.004372 **
P1   8.40   8.402  0.5441 0.474904   
K1  95.20  95.202  6.1657 0.028795 * 
N:P  1  21.28  21.282  1.3783 0.263165   
N:K  1  33.14  33.135  2.1460 0.168648   
P:K  1   0.48   0.482  0.0312 0.862752   
Residuals   12 185.29  15.441
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

HTH

G

 
 Any idea why?
 
 Lara Appleby
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] aov vs. glm

2011-11-11 Thread Gavin Simpson
. Tricky Solutions always does our best to provide accurate
  forecasts and analyses based on the data supplied, however it is
  possible that some important predictors were not included in the data
  sent to us. Information provided by us should not be solely relied
  upon when making decisions and clients should use their own judgement.
 
  On 11/11/2011, at 10:25, Lara R. Appleby 04
  lara.r.appleby...@alum.dartmouth.org  wrote:
 
I'm trying to basically do a two way ANOVA on the dependent variable 
  (clutchsize)
  with the two independent variables (treatment and species). It seems that 
  there
  are three ways I can say this in R:
 
1. glm(clutchsize~treatment*species)
2. aov(clutchsize~treatment*species)
3. anova(lm(clutchsize~treatment*species)
 
Methods 2 and 3 yield equivalent results, but Method 1 is completely 
  different!
 
Any idea why?
 
Lara Appleby
 
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  --- end of quote ---
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Question on using strata in a pRDA

2011-11-08 Thread Gavin Simpson
On Tue, 2011-11-08 at 13:19 -0200, Diogo B. Provete wrote:
 Dear list,
 I've got an issue on using a partial RDA (space with PCNM + environment)
 coupled with a variation partitionning, with randomization (strata).
 
 1) I sampled  11 ponds during 6 months, obtaining both tadpoles species
 abundance (species) and environmental data (env). I was running the
 anova() to test the significance of the rda with randomization only between
 ponds, by using the function with(). I'd like to know if my procederes are
 correct.
 
  rda.global =
 rda(species~ph+temp+CO+cond+area+as.factor(month)+as.factor(pond)+Condition
 (pcnm.matrix), na.action=na.exclude, data=env)
 
  anova(rda.global)
 
Permutation test for rda under reduced model
 
 Model: rda(formula = species.pad ~ temp + pH + DO + turb + cond + veg +
 canopy + dep + area + as.factor(month) + as.factor(pools) +
 Condition(as.matrix(pcnm.mat)), data = abiotic.pad, na.action = na.exclude)
 Df  Var  *F*  N.Perm *Pr(F) *
 Model15 0.212182* 3.7405 *   199  *0.005 ***
 Residual 16 0.060508
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 
   with(env, anova(rda.global, strata=pond))
 Permutation test for rda under reduced model
 Permutations stratified within 'pools'
 
 Model: rda(formula = species.pad ~ temp + pH + DO + turb + cond + veg +
 canopy + dep + area + as.factor(month) + as.factor(pools) +
 Condition(as.matrix(pcnm.mat)), data = abiotic.pad, na.action = na.exclude)
 Df  Var  *F * N.Perm *Pr(F)*
 Model15 0.212182 *3.7405*199   *0.01 ***
 Residual 16 0.060508
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 As you can see, the F statistic are precisely the same in the two
 procedures, but the p-values are different. Is that right?

Why do you wish for the permutation to be stratified by a variable that
you included in the analysis?

Anyway, you are testing the *observed* F against a null distribution of
F under permutation. The null distribution so generated will depend on
the permutation and permutation scheme used. The *observed* F will not
change, just the permutation p-value of this observed F stat.

 2) I have some NAs in my original environmental data matrix, but I couldn't
 remove them using na.action argument from rda. Thus, I actually couldn't
 run the varpart function. Do you know how can I couple with Nas in the
 varpart?

Remove them first by applying na.omit():

dat2 - na.omit(dat)
varpart(, data = dat2)

HTH

G

 Thank you all for your attention and (future) willingness to give me some
 feedback.
 
 Diogo
 
 
 --
 Atenciosamente,
 Diogo Borges Provete
 
 ==
 Bilogo
 Mestre em Biologia Animal (UNESP)
 Doutorando PPG Ecologia e Evoluo
 Laboratrio de Ecologia de Insetos (sl. 222)
 Departamento de Ecologia
 Instituto de Cincias Biolgicas - ICB 1
 Universidade Federal de Gois, campus II - UFG
 Goinia-GO
 CP: 131
 74001-970
 Brazil
 
  Tel. Lab. +55 62 3521-1732
  Cel. +55 62 8231-5775
  : diogoprovete
  : diogop...@yahoo.com.br
  Personal web page
  Traduza conosco:
  D-Lang  Solues linguisticas
  Perfil no ProZ
  ==
 
 
 
 --
 Thiago Gonalves-Souza, Ms.
 Universidade Estadual Paulista (UNESP)
 Departamento de Zoologia e Botnica
 Programa de Ps-Graduao em Biologia Animal (Doutorado)
 E-mail alternativo: thiagoara...@gmail.com
 Home page:http://www.wix.com/thiagocalvesouza/tg-s
 
 
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Dynamic factor analysis

2011-11-07 Thread Gavin Simpson
On Mon, 2011-11-07 at 16:38 -0500, Katherine Mills wrote:
 I am wondering if anyone knows of a R package that runs dynamic factor
 analysis, as described in Zuur et al, 2003.  (Estimating common trends in
 multivariate time series using dynamic factor analysis.  Environmetrics 14:
 665-685.)
 
 My initial searches have not been productive, but maybe this technique is
 listed under a different name??  Thanks in advance for any guidance.

See the MARSS package

http://cran.r-project.org/web/packages/MARSS/index.html

HTH

G

 Kathy Mills
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] (no subject)

2011-10-17 Thread Gavin Simpson
On Mon, 2011-10-17 at 15:33 +0200, Kerstin Kober wrote:
 Hi,
 This is a question from an R-newby! 
 I’ve got a very extensive data set and will need to run a large number of 
 Mann-Whitney U tests (in R: wilcox.test) between several sets of data. I am 
 trying to automate this as far as possible so I won’t need to run each test 
 one by one. I asked already somebody for help which provided a short piece of 
 code which enables me to run several tests one after another. However, I will 
 need to re-adjust this code as there are missing values and it stops every 
 time it encounters this.
 
 
 The data (I’ve attached a small file with some example data, so you can have 
 an idea how it looks like and run code on it):
 We’ve got bird density data from various locations, collected over the course 
 of several years. One location (“0”) is our control, which is supposed to be 
 tested against each of the other locations (“1” – “3”). We would like to run 
 this test only on data collected during the same year, so we would test “0” 
 vs “1” in 1980, “0” vs “1” in 1981 etc.
 
 This is the code I’ve got so far and which works fine as long as there exists 
 data from all years in all locations:
 
 for(i in 1980:1983){
 for(j in 1:3){ 
 tmp- d[d$year == i  d$hotspot %in% c(0, j), ]
 print(c(i, j))
 print(wilcox.test(densities~factor(hotspot), data= tmp))
 }
 }

Two options (not immediately clear where the missingness issue arises):

1) From ?wilcox.test we note that the formula method accepts (as is
common to most R functions that employ a formula interface) an na.action
argument.

Try:

print(wilcox.test(densities~factor(hotspot), data= tmp, 
  na.action = na.omit))

for example. See ?na.omit for details and other possible options. But
this should have defaulted to na.omit. What does:

 getOption(na.action)
[1] na.omit

return for you?

2) If the problem is that the `tmp` you are creating contains no rows or
certain locations are missing, then wrap the `print(wilcox.test())`
in `try()`. This will try to evaluate the function and catch any
errors, allowing your loop to continue should an error happen.

Having given you the rope, duty demands that I at least mention that
this seems a strange, if not downright dangerous, thing to be doing. Not
least because you are not doing any adjustment of p-values. I most
certainly wouldn't believe that the p-values printed in the output have
their usual meaning. Especially if the number of tests performed is, as
you say, large.

If you save each of the wilcox.test objects you could grab the p-values
later and adjust them: ?p.adjust

HTH

G

 Now, the problem is, that I don’t have data from all years in all locations: 
 during some years some locations are missing and some years are missing 
 entirely.
 
 I tried to find my way through the R help files, but because I am not quite 
 sure where exactly I would need to insert information about the dealing with 
 missing values (is the problem with the for-loop or in the wilcox.test, 
 possibly in both???), I am not entirely sure how to do this.
 
 If anybody has an idea how to do this, please let me know.
 Thank you very much for your help!!!
 
 Kerstin
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


  1   2   3   >