Re: [R] Question on CCA and RDA analysis

2015-04-22 Thread Gavin Simpson
You didn't look that hard...

Start with the Environmetrics Task View:

http://cran.r-project.org/web/views/Environmetrics.html

That might point you to the **vegan** pkg:

http://cran.r-project.org/web/packages/vegan/index.html

And that has a vignette on ordination which you might find useful.

I have been wondering if a blog post on such basic topics might be
worthwhile. Your email suggests perhaps it is. Once I'm done teaching I
will try to find some time to write some posts on these basic analyses.

HTH

G

On 10 April 2015 at 20:38, Luis Fernando García luysgar...@gmail.com
wrote:

 Yeah,

  The most useful example I found was this.

  https://gist.github.com/perrygeo/7572735.

 I always had the idea of this kind of forums was to provide sources not so
 obvious in the web. If you have something better it would be great.

 2015-04-10 18:36 GMT-03:00 Ben Bolker bbol...@gmail.com:

  Luis Fernando García luysgarcia at gmail.com writes:
 
  
   Dear R experts,
  
   I wanted to know if you can suggest me any website or tutorial just to
   learn about how to make a RDA or CDA in R
  
   Thanks in advance!
 
I hate to ask, but did you try Googling
 
  canonical correspondence analysis R
 
  ... ?
 
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gavin Simpson, PhD

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package mgcv - predict with bam: Error in X[ind, ] :, subscript out of bounds

2014-02-03 Thread Gavin Simpson
The two distributions are different. The random effect is assumed to
be a Gaussian random variable, just as it is with the GLMMs in the
lme4 package. It is fine to use such a random effect within a GAM with
a non-Gaussian error distribution, like the ones you describe using.

HTH

Gavin

On 3 February 2014 15:00, William Shadish wshad...@ucmerced.edu wrote:
 Dear Simon, your note below says bs=re specifies a Gaussian random effect
 . I have been using bs = re for data modeled with Poisson and binomial
 distributions, or variants thereof (e.g., quasi-Poisson). Have I erred in
 assuming bs =re can be used to obtain random effects for such data? Will
 Shadish

 - Actually this is ok. mgcv exploits the duality between quadratically
 penalized smooths and Gaussian random effects to allow random effects to
 be specified this way. bs=re specifies a Gaussian random effect with
 corresponding model matrix given by model.matrix(~site-1). (More
 generally s(x,y,z,bs=re) specifies a gaussian random effect with model
 matrix given by model.matrix(~x:y:z-1), with obvious generalization to
 more or fewer variables). See mgcv help file ?random.effects for more.

 best,
 Simon


 --
 William R. Shadish
 Distinguished Professor
 Founding Faculty

 Mailing Address:
 William R. Shadish
 University of California
 School of Social Sciences, Humanities and Arts
 5200 North Lake Rd
 Merced CA  95343

 Physical/Delivery Address:
 University of California Merced
 ATTN: William Shadish
 School of Social Sciences, Humanities and Arts
 Facilities Services Building A
 5200 North Lake Rd.
 Merced, CA 95343

 209-228-4372 voice
 209-228-4007 fax (communal fax: be sure to include cover sheet)
 wshad...@ucmerced.edu
 http://faculty.ucmerced.edu/wshadish/index.htm
 http://psychology.ucmerced.edu

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gavin Simpson, PhD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What computer power for GAMM models

2013-10-25 Thread Gavin Simpson
1) The model is far too complex. I have trouble imagining a system
where the trend potentially uses 49 degrees of freedom and yet the
unexplained variance within years follows an AR(7). It looks like you
have time points within years over many years; is that why you are
trying to use such a complex trend, to capture the repeated seasonal
patterns? If so, break that down and fit a spline using `bs = cc`
(or one of the other cyclic splines in mgcv) and a separate spline for
the trend. If you want to allow the seasonal pattern to vary with
time, then you can do that with multivariate splines, eg, this way you
can compare a model 1

m1 - gamm(yi ~ ti(DayOfYear, bs = cc) + ti(Time, bs = cr), )

with model 2

m2 - gamm(yi ~ ti(DayOfYear, bs = cc) + ti(Time, bs = cr) +
ti(DayOfYear, Time, bs = c(cc,cr)), )

Of course, you'll need to fix the the ARAM structure at know
coefficient (currently you are estimating it) or you aren't really
just comparing the fixed part of the models.

2) You aren't heeding Simon's warnings about fiddling with lmeControl.
You don't want EM iterations, so you need at least to have
`lmeControl(niterEM = 0)`, alongside other options

A general word of advice with using gamm(), start simply and make sure
you have lots of RAM. These models are very complex so if you start
with simple models you will be less likely to run into convergence
issues and you get results more quickly. Get lots of RAM because
models with `correlation` arguments involve inverting potentially big
covariance matrices and that can quite quickly chew through RAM.

I've had pretty good success fitting models with non-linear trends etc
to environmental time series using gamm(), but the only time things
went a bit awry was when too complex a model was required (where I had
data covering hundreds of thousands of years from an ocean core and
the trend really was very wiggly, and then the correlation structure
and the trend fought for control as there was an identifiability
issue).

HTH

Gavin

On 25 October 2013 07:19, Sylvie Martin smartin_se...@orange.fr wrote:


 Hello,

  I use GAMM function (MGCV package) in R software to study relationship 
 between pollen and pollinosis. My models include autoregressive terms. Here 
 is an exemple:

 CupAR7_2lt;-gamm(nb_rca_2~s(Trend,bs=ps,k=49) +Semaine
 nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; +s(TX03,bs=ps,k=8) 
 +s(UN03,bs=ps,k=4)nbsp; ..+pollen01
 ,correlation=corARMA(p=7,q=0,form=~Trend|Annees),
 niter=40,control=lmeControl(maxIter=10,msMaxIter=10)
 nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; ,family=quasipoisson, data=donnees, 
 na=na.omit)

 Each model need 1/2 hour to 1 hour to get a result and I frequently get the 
 following error messages:

 Erreur dans lme.formula(fixed = fixed, random = random, data = data, 
 correlation = correlation,nbsp; :
 nbsp; nlminb problem, convergence error code = 1
 nbsp; message = singular convergence (7)

 Erreur dans na.fail.default(list(Xr.4 = c(0.00374377951791214, 
 0.00438373736920182,nbsp; :
 nbsp; missing values in object


  Does my computer lack power,(if so,what is required?) or is Rnbsp; limited 
 in speed of execution or is my syntax not ok. Here are the features of my 
 computer:

 processor: 3.10 GHznbsp; i5-2400 CPU
 RAM: 4 Go (3.87 Go)
 OS: 64 bits
 Windows 7 professionnel (Pack 1)

 Another question: how can we add a smoothing parameter in the model GAMM to 
 help convergence (sp = is not accepted)?

 Thanks for your help
 Best regards

 Sylvie Martin
 SEPIA-Santé

 Bureau d'études en épidémiologie,
 biostatistiques, environnement

 31 rue de Pontivy

 56 150 BAUD

 FRANCE
 tél : 02 97 28 80 38
 fax: 02 97 28 81 10
 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gavin Simpson, PhD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assessing temporal correlation in GAM with irregular time steps

2013-09-05 Thread Gavin Simpson
On 3 September 2013 16:10, Worthington, Thomas A
thomas.worthing...@okstate.edu wrote:
 Dear Gavin

 Thank you for the very detailed response. I had started to go down the route 
 of fitting a correlation structure via gamm.

 I tried applying your code to my data but returned the error
 Error in corCAR1(~ID | SiteCode1971) : parameter in CAR(1) structure must be 
 between 0 and 1

Sorry, that is my fault, I keep forgetting that you need to specify
the formula argument, the first argument of corCAR1() is the value of
the correlation parameter if you want to specify it. So try:

corCAR1(form = ~ID | SiteCode1971)

I do this (get that error) all the time myself.
 I set the 'bar' in your code to the sample ID (basically a number between 1 
 and 192) but I wasn't sure if this was what you meant in relation to 
 'ordering of the samples'

That is not that useful as you need to give the software something
about when the samples occur in time, otherwise it doesn't have the
information needed to properly model the decay in correlation with
time.

You need to give it the observation time, however you measured it.

HTH

G

 Best wishes
 Tom

 -Original Message-
 From: Gavin Simpson [mailto:ucfa...@gmail.com]
 Sent: Tuesday, September 03, 2013 3:17 PM
 To: Worthington, Thomas A
 Cc: r-help@r-project.org
 Subject: Re: [R] Assessing temporal correlation in GAM with irregular time 
 steps

 It is possible, but you can't use the discrete time or classical stochastic 
 trend models (or evaluate using the ACF). Also, why do you care to do this 
 with regard to DoY? The assumption of the model relates to the residuals, so 
 you should check those for residual autocorrelation.

 As you are using `mgcv::gam` you could also use `mgcv::gamm` which can then 
 leverage the correlation structures from the nlme package, which has spatial 
 correlation structures (and you can think of time as a 1-d spatial 
 direction). The package also has a `corCAR1()` correlation structure which is 
 the continuous-time analogue of the AR(1). Fitting via `gamm()` will also 
 allow you to use the `Variogram()` function from the nlme package to assess 
 the model residuals for residual autocorrelation.

 For example you could compare the two fits

 m0 - gamm(Length ~ s(DOY, by = SiteCode) + SiteCode, data = foo, method = 
 REML)
 m1 - gamm(Length ~ s(DOY, by = SiteCode) + SiteCode, data = foo, method = 
 REML,
 correlation = corCAR1( ~ bar | SiteCode))

 where `foo` is the object that contains the variables mentioned in the call, 
 and `bar` is the variable (in `foo)` that indicates the ordering of the 
 samples. Notice that I nest the CAR(1) within the two respective Sites, but 
 do note IIRC that this fits the same residual correlation structure to both 
 sites' residuals (i.e. there is 1 CAR(1) process, not two separate ones).

 require(nlme)
 anova(m0$lme, m1$lme)

 will perform a likelihood ratio test on the two models.

 If you have residual autocorrelation, do note that the smooth for DoY may be 
 chosen to be more complex than is appropriate (it might be fitting the 
 autocorrleated noise), so you may want to fix the degrees of freedom for the 
 smoother at some a priori chosen value and use this same value when fitting 
 both m0 and m1, or at the very least set an upper limit on the complexity of 
 the DoY smooth, say via s(DoY, by = SiteCode, k = 5).

 Finally, as a length = 0 insect makes no sense, the assumption of Gaussian 
 (Normal) errors may be in trouble with your data; apart from their strictly 
 positive nature, the mean-variance relationship of the data may not follow 
 that of the assumptions for the errors. You can move to a GLM (GAM) to 
 account for this but things get very tricky with the correlation structures 
 (you can use gamm() still but fitting then goes via glmmPQL() in the MASS 
 package a thence to lme()).

 If you just want to fit a variogram to something, there are a large number of 
 spatial packages available for R, several of which can fit variograms to 
 data, though you will need to study their respective help files for how to 
 use them. As for the input data, often the time/date of sampling encoded as a 
 numeric will be sufficient input, but you will need to check individual 
 functions for what they require.
 I would check out the Spatial Task View on CRAN.

 HTH

 G

 On 28 August 2013 14:26, Worthington, Thomas A 
 thomas.worthing...@okstate.edu wrote:
 I have constructed a GAM using the package mgcv to test whether the
 lengths of an emerging insect (Length) varies with day of the year
 (DOY) and between two sites (SiteCode). The data are collected at
 irregular time steps ranging from 2 days to 20 days between samples.
 The GAM takes the form

 M3 - gam(Length ~s(DOY, by = SiteCode) + SiteCode)

 As the data are a time series I would like to test for temporal 
 autocorrelation. I have read that it is not possible to use the 
 autocorrelation function (ACF) due

Re: [R] Assessing temporal correlation in GAM with irregular time steps

2013-09-03 Thread Gavin Simpson
It is possible, but you can't use the discrete time or classical
stochastic trend models (or evaluate using the ACF). Also, why do you
care to do this with regard to DoY? The assumption of the model
relates to the residuals, so you should check those for residual
autocorrelation.

As you are using `mgcv::gam` you could also use `mgcv::gamm` which can
then leverage the correlation structures from the nlme package, which
has spatial correlation structures (and you can think of time as a 1-d
spatial direction). The package also has a `corCAR1()` correlation
structure which is the continuous-time analogue of the AR(1). Fitting
via `gamm()` will also allow you to use the `Variogram()` function
from the nlme package to assess the model residuals for residual
autocorrelation.

For example you could compare the two fits

m0 - gamm(Length ~ s(DOY, by = SiteCode) + SiteCode, data = foo,
method = REML)
m1 - gamm(Length ~ s(DOY, by = SiteCode) + SiteCode, data = foo,
method = REML,
correlation = corCAR1( ~ bar | SiteCode))

where `foo` is the object that contains the variables mentioned in the
call, and `bar` is the variable (in `foo)` that indicates the ordering
of the samples. Notice that I nest the CAR(1) within the two
respective Sites, but do note IIRC that this fits the same residual
correlation structure to both sites' residuals (i.e. there is 1 CAR(1)
process, not two separate ones).

require(nlme)
anova(m0$lme, m1$lme)

will perform a likelihood ratio test on the two models.

If you have residual autocorrelation, do note that the smooth for DoY
may be chosen to be more complex than is appropriate (it might be
fitting the autocorrleated noise), so you may want to fix the degrees
of freedom for the smoother at some a priori chosen value and use this
same value when fitting both m0 and m1, or at the very least set an
upper limit on the complexity of the DoY smooth, say via s(DoY, by =
SiteCode, k = 5).

Finally, as a length = 0 insect makes no sense, the assumption of
Gaussian (Normal) errors may be in trouble with your data; apart from
their strictly positive nature, the mean-variance relationship of the
data may not follow that of the assumptions for the errors. You can
move to a GLM (GAM) to account for this but things get very tricky
with the correlation structures (you can use gamm() still but fitting
then goes via glmmPQL() in the MASS package a thence to lme()).

If you just want to fit a variogram to something, there are a large
number of spatial packages available for R, several of which can fit
variograms to data, though you will need to study their respective
help files for how to use them. As for the input data, often the
time/date of sampling encoded as a numeric will be sufficient input,
but you will need to check individual functions for what they require.
I would check out the Spatial Task View on CRAN.

HTH

G

On 28 August 2013 14:26, Worthington, Thomas A
thomas.worthing...@okstate.edu wrote:
 I have constructed a GAM using the package mgcv to test whether the lengths 
 of an emerging insect (Length) varies with day of the year (DOY) and between 
 two sites (SiteCode). The data are collected at irregular time steps ranging 
 from 2 days to 20 days between samples. The GAM takes the form

 M3 - gam(Length ~s(DOY, by = SiteCode) + SiteCode)

 As the data are a time series I would like to test for temporal 
 autocorrelation. I have read that it is not possible to use the 
 autocorrelation function (ACF) due to the irregular spacing and that 
 producing a variogram in relation to DOY would be an option.

 Is this a correct method to test for temporal autocorrelation?

 And could someone suggest the code to produce the variogram as I'm getting an 
 error related to the 'distance' argument

 Best wishes
 Tom



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gavin Simpson, PhD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Setting up 3D tensor product interactions in mgcv

2013-09-03 Thread Gavin Simpson
From a reading of `?ti`

 It is sometimes useful to investigate smooth models with a
 main-effects + interactions structure, for example

 f_1(x)  + f_2(z) + f_3(x,z)

 This functional ANOVA decomposition is supported by ‘ti’ terms,
 which produce tensor product interactions from which the main
 effects have been excluded, under the assumption that they will be
 included separately. For example the ‘~ ti(x) + ti(z) + ti(x,z)’
 would produce the above main effects + interaction structure. This
 is much better than attempting the same thing with ‘s’or ‘te’
 terms representing the interactions (although mgcv does not forbid
 it).

I think the following will do what you want:

mdl - gam(PA ~ ti(x) + ti(y) + ti(z) + ti(x,y) + ti(x,z) + ti(y,z) +
ti(x,y,z), )

HTH

G

On 23 August 2013 02:05, Mark Payne markpayneatw...@gmail.com wrote:
 Hi,

 I am trying to fit a smoothing model where there are three dimensions
 over which I can smooth (x,y,z). I expect interactions between some,
 or all, of these terms, and so I have set up my model as

 mdl - gam(PA ~ s(x) + s(y) + s(z) + te(x,y) + te(x,z) + te(y,z) +
 te(x,y,z),...)

 I have recently read about the ti(), tensor product interaction
 smoother, which takes care of these interaction terms elegantly and
 does the nesting properly. The help file says This is much better
 than attempting the same thing with ‘s’or ‘te’ terms representing the
 interactions (although mgcv does not forbid it). There is a 2D
 example there also. But I don't understand how I should set this up
 for my 3D example. Do I simply replace the te's above with ti? Or is
 there more to it than that?

 Does anyone have experience with this, and can explain how I should do
 it properly?

 Mark

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gavin Simpson, PhD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple interaction terms in GAMM-model

2013-07-26 Thread Gavin Simpson
On Thu, 2013-07-25 at 04:56 -0700, Jeroen wrote:
 Dear all,
 
 I am trying to correlate a variable tau1 to a set of other variables (x1,
 x2, x3, x4), taking into account an interaction with time ('doy') and place
 ('region'), and taking into account dependency of data in time per object
 ID. My dataset looks like:

snip /

 Since the data is dependent in time per objectID, I use a GAMM model with an
 autocorrelation function. Since each variable x1, x2, etc. is dependent on
 time and place, I should incorporate this as well. 
 
 Therefore I am wondering if the following gamm-model is correct for my
 situation:
 
 model - gamm( tau1 ~ te( x1, by= doy ) + te( x1, by= factor( region ) ) +
 ... + te( x4, by= doy ) + te( x4, by= factor( region ) ) + factor( region ),
 correlation= corAR1(form= ~ doy|objectID ), na.action= na.omit ).
 
 Does anyone know if this is ok? 

The model looks a little odd - the way you've written it you don't need
`te()` smooths. For the interaction with doy, I would try

te(x1, doy, bs = c(cr, cc))

assuming x1 and doy are continuous. This specifies cubic splines and
cyclic cubic splines respectively for x1 and doy. I use a cyclic spline
for doy as we might not expect Jan 1st and Dec 31st to be much
different. By separating the interaction between x1 and doy and the one
for x1 and region you are saying that the way the effect of x1 varies
through the year does not change between regions. Is that reasonable?

I think you need something for the objectID - a random effect or a fixed
effect will account for differences in mean values of taul between
objects. A random effect seems most useful otherwise you'll eat into
your degrees of freedom, but you have plenty of those.

The correlation part assumes that the residuals follow an AR(1) applied
within the objects, with each AR(1) having the same coefficient.
Autocorrelation decreases exponentially with lag/separation. It is a
reasonable start. Fitting will be much quicker without this. Could you
try fitting without and then extract the normalised residuals to check
for residual correlation?

I hope you have a lot of memory and a lot of spare time available; my
experience of fitting similarly sized (though not as complex - I had one
*very* long series) was that gamm() used large amounts of RAM (I had
16GB and most was used for a single model) and took many days to fit.
You may find the random effect for object fights with the AR(1) so you
could end up with an odd fit if it fits at all.

You are going to want to turn on verbosity when fitting so you can see
how the optimisation is proceeding. Something like

require(nlme)
## need a control object
ctrl - lmeControl(msVerbose = TRUE,
   maxIter = 400,
   msMaxIter = 400,
   niterEM = 0, ## this is VERY important for gamm()!
   tolerance = 1e-8,
   msTol = 1e-8,
   msMaxEval = 400)

then add `control = ctrl` to the `gamm()` call.

I would approach the expecting the model to fail to fit so be prepared
to simplify it etc.

Good luck,

G

 Or should I use a model which also includes terms like  te( x1 ) + ... +
 te( x4 ).
 And is the correlation function correct?
 
 Thanks so much!!
 
 Jeroen
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Multiple-interaction-terms-in-GAMM-model-tp4672297.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MGCV: Degrees of freedom of smooth terms

2013-07-25 Thread Gavin Simpson
On Tue, 2013-07-23 at 11:16 +0200, Christoph Scherber wrote:
 Dear all,
 
 This is just a quick question regarding degrees of freedom in GAM
 models fit by MGCV (using select=T):
 
 Can I roughly interpret them as:
 
 1 df: linear effect of x on y
 2 df: approximately quadratic of x on y
 3 df: approximately cubic effect of x on y?

Yes, approximately

 1 df for a spatial term s(x,y): bilinear effect (?) or how would I call this?

A bilinear effect would have two df, no? In a linear regression z ~ x +
y would define a plane just like s(x, y) can and would use 2 df. 1 df
for the entire s(x,y) implies an additional penalty such that less than
1df is spent in the `x` or `y` dimensions of the spline.

 And what does ref.df in the summary output mean; is this the unpenalized 
 degrees of freedom for
 each term?

IIRC, these are the dfs that are used in the tests reported. I am not
familiar enough with the details to comment more. If you turn on Select
= TRUE for example which adds an additional penalty then the ref.df can
be much larger than the edf values.

You might send an email to Simon Wood (or see if he picks up on this)
for a (far) more authoritative answer on this part of your question.

HTH

G

 
 Thank you very much for answering!
 
 Best wishes,
 Christoph
 
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error with metaMDS

2013-06-24 Thread Gavin Simpson
On Mon, 2013-06-24 at 19:33 +0800, Suparna Mitra wrote:
  H
 ello R-experts,
   I want to do ordination plots using vegan metaMDS.
 I have a where many cells have zero values.
 
snip /
 I am having two different kind of errors for these two data...
 Error 1
  ord1 - metaMDS(
  X
 =bray)

That is fundamentally wrong - you are setting argument `X` to the
character vector `bray`, that is not how you call metaMDS(). Hence I
suspect you didn't bother to include the full code...!

Let's assume you cal actually call metaMDS() correctly...

The first set of warnings (not the error) come `vegdist()`. The first is
from :

if (method  2  any(rowSums(x, na.rm = TRUE) == 0))

and hence if you do

which(rowSums(X, na.rm = TRUE) == 0)

you'll see which rows (samples) have no counts at all. The second
warning in the first set comes from

if (any(is.na(d))) 
warning(missing values in results)

hence by the time vegdist has computed the dissimilarity, those
computation ended up generating one or more `NA` values. Things go
downhill from there as the error generated from within metaMDS() is
because we are comparing the distances with the smallest representable
number and any comparison with `NA` yields `NA` and hence the first
Error you see. We should probably catch that but I don't have a
reproducible example to see why we don't...

 Square root transformation
 Wisconsin double standardization
 Error in if (any(dist  -sqrt(.Machine$double.eps))) warning(some
 dissimilarities are negative -- is this intentional?) :
   missing value where TRUE/FALSE needed
 In addition: Warning messages:
 1: In distfun(comm, method = distance, ...) :
   you have empty rows: their dissimilarities may be meaningless in method
 bray
 2: In distfun(comm, method = distance, ...) : missing values in results
 
 Error 2
 ord.data= metaMDS(data, distance=bray)
 Error in if (any(autotransform, noshare  0, wascores)  any(comm  0)) {
 :
   missing value where TRUE/FALSE needed
 In addition: Warning message:
 In Ops.factor(left, right) :  not meaningful for factors

Look at `str(data)` and if any of the columns are factors, change them
to be numeric or find out why R thinks they are factors - it wouldn't
normally convert numeric data to a factor when reading the data in.

What is `data`? You only refer to `Genus_data`.

Please try to be specific about exactly what you did - i.e. include the
actual code.

I suspect the things you try below are pointless as it is not zero
distances that are the problem but sites for which no observations were
recorded.

G

 I searched all the details of metaMDS where it is suggested to avail the
 argument 'zerodist'
 So I tried both
 
 X.dist1 - metaMDSdist(X, method=bray,zerodist = ignore)
 X.dist2 - metaMDSdist(X, method=bray,zerodist = add)
 
 Can Please help me with this.
 Thanks,
 Mitra
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gamm in mgcv random effect significance

2013-06-11 Thread Gavin Simpson
On Tue, 2013-06-11 at 10:08 -0700, William Shadish wrote:
 Gavin et al.,
 
 Thanks so much for the help. Unfortunately, the command
 
   anova(g1$lme, g2$lme)
 
 gives Error in eval(expr, envir, enclos) : object 'fixed' not found

This is with mgcv:::gamm yes? Strange - did you load nlme first? I think
mgcv now imports from the nlme package so to use its functions/methods
explicitly you need to load nlme - before loading mgcv also loaded nlme,
but it no longer does so.

That *should* work.

HTH

G

 and for bam (which is the one that can use a known ar1 term), the error is
 
   AR1 parameter rho unused with generalized model
 
 Apparently it cannot run for binomial distributions, and presumably also 
 Poisson.
 
 I did find a Frequently Asked Questions for package mgcv page that said
 
 How can I compare gamm models? In the identity link normal errors case, 
 then AIC and hypotheis testing based methods are fine. Otherwise it is 
 best to work out a strategy based on the summary.gam
 
 So putting all this together, I take it that my binomial example will 
 not support a direct model comparison to test the significance of the 
 random effects. I'm guessing the best strategy based on the summary.gam 
 is probably just to compare fit indices like Log Likelihoods.
 
 If anyone has any other suggestions, though, please do let me know.
 
 Thanks so much.
 
 Will Shadish
 
 On 6/7/2013 3:02 PM, Gavin Simpson wrote:
  On Fri, 2013-06-07 at 13:12 -0700, William Shadish wrote:
  Dear R-helpers,
 
  I'd like to understand how to test the statistical significance of a
  random effect in gamm. I am using gamm because I want to test a model
  with an AR(1) error structure, and it is my understanding neither gam
  nor gamm4 will do the latter.
 
  gamm4() can't yes and out of the box mgcv::gam can't either but
  see ?magic for an example of correlated errors and how the fits can be
  manipulated to take the AR(1) (or any structure really as far as I can
  tell) into account.
 
  You might like to look at mgcv::bam() which allows an known AR(1) term
  but do check that it does what you think; with a random effect spline
  I'm not at all certain that it will nest the AR(1) in the random effect
  level.
 
  snip /
  Consider, for example, two models, both with AR(1) but one allowing a
  random effect on xc:
 
  g1 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial,
  correlation=corAR1())
  g2 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, random =
  list(xc=~1),correlation=corAR1())
 
  Shouldn't you specify how the AR(1) is nested in the hierarchy here,
  i.e. AR(1) within xc? maybe I'm not following your data structure
  correctly.
 
  I include the output for g1 and g2 below, but the question is how to
  test the significance of the random effect on xc. I considered a test
  comparing the Log-Likelihoods, but have no idea what the degrees of
  freedom would be given that s(xc) is smoothed. I also tried:
 
  anova(g1$gam, g2$gam)
 
  gamm() fits via the lme() function of package nlme. To do what you want,
  you need the anova() method for objects of class lme, e.g.
 
  anova(g1$lme, g2$lme)
 
  Then I think you should check if the fits were done via REML and also be
  aware of the issue of testing wether a variance term is 0.
 
  that did not seem to return anything useful for this question.
 
  A related question is how to test the significance of adding a second
  random effect to a model that already has a random effect, such as:
 
  g3 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random =
  list(Case=~1, z=~1),correlation=corAR1())
  g4 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random =
  list(Case=~1, z=~1, int=~1),correlation=corAR1())
 
  Again, I think you need anova() on the $lme components.
 
  HTH
 
  G
 
  Any help would be appreciated.
 
  Thanks.
 
  Will Shadish
  
  g1
  $lme
  Linear mixed-effects model fit by maximum likelihood
  Data: data
  Log-likelihood: -437.696
  Fixed: fixed
  X(Intercept)   Xz XintXs(xc)Fx1
   0.6738466   -2.56883170.0137415   -0.1801294
 
  Random effects:
 Formula: ~Xr - 1 | g
 Structure: pdIdnot
 Xr1  Xr2  Xr3 Xr4
  Xr5  Xr6  Xr7  Xr8 Residual
  StdDev: 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781
  0.0004377781 0.0004377781 0.0004377781 1.693177
 
  Correlation Structure: AR(1)
 Formula: ~1 | g
 Parameter estimate(s):
  Phi
  0.3110725
  Variance function:
 Structure: fixed weights
 Formula: ~invwt
  Number of Observations: 264
  Number of Groups: 1
 
  $gam
 
  Family: binomial
  Link function: logit
 
  Formula:
  y ~ s(xc) + z + int
 
  Estimated degrees of freedom:
  1  total = 4
 
  attr(,class)
  [1] gamm list
  
 g2
  $lme
  Linear mixed-effects model fit by maximum likelihood
  Data: data

Re: [R] gamm in mgcv random effect significance

2013-06-07 Thread Gavin Simpson
On Fri, 2013-06-07 at 13:12 -0700, William Shadish wrote:
 Dear R-helpers,
 
 I'd like to understand how to test the statistical significance of a 
 random effect in gamm. I am using gamm because I want to test a model 
 with an AR(1) error structure, and it is my understanding neither gam 
 nor gamm4 will do the latter.

gamm4() can't yes and out of the box mgcv::gam can't either but
see ?magic for an example of correlated errors and how the fits can be
manipulated to take the AR(1) (or any structure really as far as I can
tell) into account.

You might like to look at mgcv::bam() which allows an known AR(1) term
but do check that it does what you think; with a random effect spline
I'm not at all certain that it will nest the AR(1) in the random effect
level.

snip /
 Consider, for example, two models, both with AR(1) but one allowing a 
 random effect on xc:
 
 g1 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, 
 correlation=corAR1())
 g2 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, random = 
 list(xc=~1),correlation=corAR1())

Shouldn't you specify how the AR(1) is nested in the hierarchy here,
i.e. AR(1) within xc? maybe I'm not following your data structure
correctly.

 I include the output for g1 and g2 below, but the question is how to 
 test the significance of the random effect on xc. I considered a test 
 comparing the Log-Likelihoods, but have no idea what the degrees of 
 freedom would be given that s(xc) is smoothed. I also tried:
 
 anova(g1$gam, g2$gam)

gamm() fits via the lme() function of package nlme. To do what you want,
you need the anova() method for objects of class lme, e.g.

anova(g1$lme, g2$lme)

Then I think you should check if the fits were done via REML and also be
aware of the issue of testing wether a variance term is 0.

 that did not seem to return anything useful for this question.
 
 A related question is how to test the significance of adding a second 
 random effect to a model that already has a random effect, such as:
 
 g3 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = 
 list(Case=~1, z=~1),correlation=corAR1())
 g4 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = 
 list(Case=~1, z=~1, int=~1),correlation=corAR1())

Again, I think you need anova() on the $lme components.

HTH

G

 Any help would be appreciated.
 
 Thanks.
 
 Will Shadish
 
 g1
 $lme
 Linear mixed-effects model fit by maximum likelihood
Data: data
Log-likelihood: -437.696
Fixed: fixed
 X(Intercept)   Xz XintXs(xc)Fx1
 0.6738466   -2.56883170.0137415   -0.1801294
 
 Random effects:
   Formula: ~Xr - 1 | g
   Structure: pdIdnot
   Xr1  Xr2  Xr3 Xr4  
 Xr5  Xr6  Xr7  Xr8 Residual
 StdDev: 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 
 0.0004377781 0.0004377781 0.0004377781 1.693177
 
 Correlation Structure: AR(1)
   Formula: ~1 | g
   Parameter estimate(s):
Phi
 0.3110725
 Variance function:
   Structure: fixed weights
   Formula: ~invwt
 Number of Observations: 264
 Number of Groups: 1
 
 $gam
 
 Family: binomial
 Link function: logit
 
 Formula:
 y ~ s(xc) + z + int
 
 Estimated degrees of freedom:
 1  total = 4
 
 attr(,class)
 [1] gamm list
 
   g2
 $lme
 Linear mixed-effects model fit by maximum likelihood
Data: data
Log-likelihood: -443.9495
Fixed: fixed
 X(Intercept)   Xz XintXs(xc)Fx1
   0.720018143 -2.562155820  0.003457463 -0.045821030
 
 Random effects:
   Formula: ~Xr - 1 | g
   Structure: pdIdnot
   Xr1  Xr2  Xr3 Xr4  
 Xr5  Xr6  Xr7  Xr8
 StdDev: 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 
 7.056078e-06 7.056078e-06 7.056078e-06
 
   Formula: ~1 | xc %in% g
   (Intercept) Residual
 StdDev: 6.277279e-05 1.683007
 
 Correlation Structure: AR(1)
   Formula: ~1 | g/xc
   Parameter estimate(s):
Phi
 0.1809409
 Variance function:
   Structure: fixed weights
   Formula: ~invwt
 Number of Observations: 264
 Number of Groups:
  g xc %in% g
  134
 
 $gam
 
 Family: binomial
 Link function: logit
 
 Formula:
 y ~ s(xc) + z + int
 
 Estimated degrees of freedom:
 1  total = 4
 
 attr(,class)
 [1] gamm list
 
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NMDS using Vegan

2013-05-15 Thread Gavin Simpson
On Wed, 2013-05-15 at 12:06 +0800, Suparna Mitra wrote:
 Hello R experts,
   I am new to Vegan and use trying to follow the tutorial to perform NMDS
 for my data. But after performing the metaMDS, when I plotted my results
 the default plot shows MDS1 vs MDS2. Thought according to the
 tutorial Default ordination plot should display NMDS1vs NMDS2. Why is this
 difference? Accourding to tutorial it says: Function metaMDS
 is a wrapper to perform NMDS.
 Can anybody please help me to understand this?
 Thanks,
 Mitra

They are just labels and metaMDS **has** performed an NMDS. Not sure why
Jari labelled these as MDSx. If this bothers you so, add your own
labels:

require(vegan)
data(dune)
sol - metaMDS(dune)
plot(sol, xlab = NMDS1, ylab = NMDS2)

HTH

G

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vegan package treatment of zeros

2013-05-15 Thread Gavin Simpson
On Wed, 2013-05-15 at 09:04 -0500, Michael Rentz wrote:
 Hello:  I have a good size database of plant cover measurements I was
 running a Shannon Diversity on.  I have every cell as a default of 0 if
 it is not present.  The tutorials I have looked at all had blanks in cases
 of absence.  I just wanted some kind of verification that VEGAN will
 recognize 0s as absent
 
 I just wanted to make sure I was not inadvertently setting myself up for
 problems.

Blanks would be interpreted as `NA` in R when you read the data in.
Vegan would respect that and most likely fail because of the missing
data as not all methods are appropriate when there are `NA` values.

Hence a common first step is to replace `NA` with 0. As you have already
have this indicated in your file external to R then you won;t need to do
this extra step *in* R.

How zeros are then treated will depend largely on the method employed
and/or the dissimilarity coefficient used, but that is not an R
question. A recent posting to the R-Sig-Ecology list by Brian Cade
presents some interesting observations on the double zero issue:

https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20130419/114d1e5f/attachment.pl

HTH

G

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NMDS in Vegan: problems in stressplot, best solution

2013-04-26 Thread Gavin Simpson
On Fri, 2013-04-26 at 12:42 -0500, Kumar Mainali wrote:
 Hello,
 
 I can draw a basic stress plot for NMDS with the following code in package
 Vegan.
  stressplot(parth.mds, parth.dis)
 
 When I try to specify the line and point types, it gives me error message.
  stressplot(parth.mds, parth.dis, pch=1, p.col=gray, lwd=2, l.col=red)
 Error in plot.xy(xy, type, ...) : invalid plot type
 
 In the above code, if I removed line type, it does give me the plot only of
 points with my choice of type.
  stressplot(parth.mds, parth.dis, pch=1, p.col=gray)
 
 Why cannot I define both line and point at the same time?

You can. What you can't do is use argument `dis` with an metaMDS object.
If you use:

stressplot(parth.mds, pch=1, p.col=gray, lwd=2, l.col=red)

you'll see it works just fine.

We'll see about providing a better error message if you do what the
documentation asks you not to.

 If I have 100 iterations for metaMDS, then when I plot the result, does it
 give me result from best solution?

The best solution it encountered in the 100 random starts, yes.

 How do I know that.

It is implied in point 4. of the Details section of ?metaMDS.

  Can you plot the
 Stress by Iteration number?

Not in a graphical plot. The stresses for each iteration are printed to
the console at each iteration. Note these iterations are random
starts, each of which has iterations of the algorithm.

HTH

G

 parth.mds - metaMDS(WorldPRSenv, distance = bray, k = 2, trymax = 100,
 engine = c(monoMDS, isoMDS),
  autotransform =TRUE, wascores = TRUE, expand = TRUE, trace = 2)
 plot(parth.mds, type = p)
 
 Thanks in advance,
 Kumar
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RDA permutest envfit

2013-04-24 Thread Gavin Simpson
On Wed, 2013-04-24 at 10:38 -0700, Rémi Lesmerises wrote:
 Dear all,
 
 I did a RDA and when I looked to the signification of the test with
 permutest, the output was non-significant. But when I used the envfit
 function, some of the vectors are significant. All the test's
 conditions are respected. What it means? Is it an error in the script?

The two functions are testing two fundamentally different things.
`permutest` (as you invoked it --- `first = TRUE`) is testing the first
axis of the RDA. That axis is a linear combination of the constraints.

The `envfit` procedure is testing the individual correlations between
the 2-d configuration of samples in the RDA space and the direction of
maximal variance of the environmental data. Each variable is considered
separately.

I can easily imagine a case, where the variance explained on the first
axis is not significant but variance over 2 axes is significant, as one
where the vectors do not point solely along the first RDA axis but also
point along the 2nd axis. By looking only at their contributions to the
first axis and summing them you don't explain a whole lot, but when you
look at the directions in 2D space each variable explains a significant
amount of variance.

HTH

G

 Commands and output:
 
  permutest(rda.ind, perm=999, first=TRUE)
 
 Permutation test for rda 
 
 Call: rda(formula = x ~ ARH_frs + CON_frs + PRP_frs + RUI_frs +
 VAM_frs, data = z)
 Permutation test for first constrained eigenvalue
 Pseudo-F:4.093568 (with 1, 10 Degrees of Freedom)
 Significance:0.413 
 Based on 999 permutations under reduced model.
 
  fit - envfit(rda.ind, z, perm = 999, display = lc)
  fit
 
 ***VECTORS
 
  RDA1  RDA2 r2 Pr(r)
 VAM_frs  0.145281 -0.989390 0.2388  0.147
 
 ARH_frs -0.876494 -0.481413 0.6127  0.002 ** 
 CON_frs  0.904278  0.426944 0.4846  0.013 *  
 PRP_frs -0.997634  0.068755 0.9433  0.001 ***
 RUI_frs -0.648512 -0.761204 0.6243  0.004 ** 
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 P values based on 999 permutations.
 
  
 Rémi Lesmerises, biol. M.Sc.,
 Candidat Ph.D. en Biologie
 Université du Québec à Rimouski
 remilesmeri...@yahoo.ca
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RDA permutest envfit

2013-04-24 Thread Gavin Simpson
On Wed, 2013-04-24 at 13:13 -0700, Rémi Lesmerises wrote:
 Thank you!
 
 A last question: Is it still the same explanation if I remove the
 condition first=TRUE (and then testing for all axis) and permutest
 gives the same result?

No; again `permutest(, first = FALSE)` and `envfit()` are testing
different models/fits but how they are different is not the same as with
your previous question.

`permutest(, first = FALSE)` is testing whether the variance
explained by the k environmental variable (k = 4 I believe in your case)
is sufficiently large as to be unusual when compared to the variance
explained under a null model (achieved by permuting residuals,
propagating those residuals plus fitted values to give new data and
refitting the model on those new data). `envfit()` is doing what it
always did; assessing whether the correlation between the vector and the
2-d configuration is unusually large.

In the `permutest(, first = FALSE)` case, you have a model with 4df.
In the `envfit()` case you have 4 models each with 1 degree of freedom.

The two methods also differ in terms of the role played by the
environmental data. In the `permutest` case, the environmental data are
the predictors. In the `envfit` case the env data are the responses and
the axis scores in the 2 dimensions chosen are the predictors.

HTH

G

 Rémi Lesmerises, biol. M.Sc.,
 Candidat Ph.D. en Biologie
 Université du Québec à Rimouski
 remilesmeri...@yahoo.ca
 
 
 
 On Wed, 2013-04-24 at 10:38 -0700, Rémi Lesmerises wrote:
  Dear all,
  
  I did a RDA and when I looked to the signification of the test with
  permutest, the output was non-significant. But when I used the envfit
  function, some of the vectors are significant. All the test's
  conditions are respected. What it means? Is it an error in the script?
 
 The two functions are testing two fundamentally different things.
 `permutest` (as you invoked it --- `first = TRUE`) is testing the first
 axis of the RDA. That axis is a linear combination of the constraints.
 
 The `envfit` procedure is testing the individual correlations between
 the 2-d configuration of samples in the RDA space and the direction of
 maximal variance of the environmental data. Each variable is considered
 separately.
 
 I can easily imagine a case, where the variance explained on the first
 axis is not significant but variance over 2 axes is significant, as one
 where the vectors do not point solely along the first RDA axis but also
 point along the 2nd axis. By looking only at their contributions to the
 first axis and summing them you don't explain a whole lot, but when you
 look at the directions in 2D space each variable explains a significant
 amount of variance.
 
 HTH
 
 G
 
  Commands and output:
  
   permutest(rda.ind, perm=999, first=TRUE)
  
  Permutation test for rda 
  
  Call: rda(formula = x ~ ARH_frs + CON_frs + PRP_frs + RUI_frs +
  VAM_frs, data = z)
  Permutation test for first constrained eigenvalue
  Pseudo-F:4.093568 (with 1, 10 Degrees of Freedom)
  Significance:0.413 
  Based on 999 permutations under reduced model.
  
   fit - envfit(rda.ind, z, perm = 999, display = lc)
   fit
  
  ***VECTORS
  
   RDA1  RDA2 r2 Pr(r)
  VAM_frs  0.145281 -0.989390 0.2388  0.147
  
  ARH_frs -0.876494 -0.481413 0.6127  0.002 ** 
  CON_frs  0.904278  0.426944 0.4846  0.013 *  
  PRP_frs -0.997634  0.068755 0.9433  0.001 ***
  RUI_frs -0.648512 -0.761204 0.6243  0.004 ** 
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
  P values based on 999 permutations.
  
   
  Rémi Lesmerises, biol. M.Sc.,
  Candidat Ph.D. en Biologie
  Université du Québec à Rimouski
  remilesmeri...@yahoo.ca
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAM Penalised Splines - Intercept

2013-04-23 Thread Gavin Simpson
On Tue, 2013-04-23 at 17:51 +0200, Lucas Holland wrote:
 Hey all,
 
 I'm using the gam() function inside the mgcv package to fit a
 penalised spline to some data. However, I don't quite understand what
 exactly the intercept it includes by default is / how to interpret
 it. 
 
 Ideally I'd like to understand what the intercept is in terms of the
 B-Spline and/or truncated power series basis representation. 

It is the mean of the response:

library(mgcv)
set.seed(2) ## simulate some data... 
dat - gamSim(1, n=400, dist=normal, scale=2)
#Gu  Wahba 4 term additive model

 b - gam(y ~ s(x2), data = dat)
 coef(b)[1]
(Intercept) 
   7.833279 
 with(dat, mean(y))
[1] 7.833279
 b2 - gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat)
 coef(b2)[1]
(Intercept) 
   7.833279

In models with parametric factor terms what the intercept is depends on
the contrasts used - the default in R would represent the mean of the
response for the reference level(s) of factor terms in the model.

The intercept is separates from the penalised spline terms in the model.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CCA report species environment correlation

2013-04-13 Thread Gavin Simpson
On Fri, 2013-04-12 at 11:00 -0700, SRuhl wrote:
 Hi everyone,
 I did a CCA with R in the vegan package and got all the outputs I need/want
 to report except for one, that is the species environment correlation
 values. I only know of them because of one of the sources I read up on about
 CCAs reported it and I have the suspicion that it´s a value reported when
 you do a CCA in SPSS (which I have never done) and that just doesnßt come up
 in R.
 Can anyone tell me how to find it or calculate it myself? Do I need to run
 posthoc stats or additional tests on my results?
 Thanks in advance!

See `?spenvcor` with vegan loaded. Pass it your CCA model object.

G

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/CCA-report-species-environment-correlation-tp4664089.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in plot.envfit(ef, p.max = 0.1) : (subscript) logical subscript too long

2012-12-31 Thread Gavin Simpson
Hi Laura,

Following a bit of further digging, I noticed that the bug was in not
handling the levels of the factor constraints appropriately. In your
data set 12 rows (observations) were deleted because of missing data.
These observations meant that one or more of the factor variables had
more levels() than actually used in the subset of data analysed.

I have checked in some fixes for this which removes the extraneous
levels:

https://r-forge.r-project.org/scm/viewvc.php?view=revroot=veganrevision=2342

We are currently planning a release for some time in the new year, but
in the meantime, you can get a copy of the development version of Vegan
from R-forge once the new binary has been built (needs to be Rev 2342 or
later to include this fix but only Rev 2341 has been built thus far):

https://r-forge.r-project.org/R/?group_id=68

I've CC'd this back to R-Help for the record.

Thanks for taking the time to post about this and for sending me your
data so I could track this down quickly.

All the best and Happy New Year.

G

On Mon, 2012-12-31 at 14:17 +, Laura Martínez Suz wrote:
 Hi Gavin,
 
 
 Thank you! It works now, that's great, many thanks. I have spent hours
 trying to see what I was doing wrong, I'm not very good in R..
 
 
 That was my first email to the R community, should I officialy reply
 to you?
 
 
 All the best for you as well and happy New Year!
 
 
 Laura
 
 
 
  Subject: RE: [R] Error in plot.envfit(ef, p.max = 0.1) : (subscript)
 logical subscript too long
  From: gavin.simp...@ucl.ac.uk
  To: lauram...@hotmail.com
  CC: lauram...@um.es
  Date: Mon, 31 Dec 2012 12:02:09 +0100
  
  Hi Laura,
  
  Thanks for this. There is a bug which has lead me to discover
 another
  bug in that same section of code. The issue you uncovered is that
 you
  have factor variables but none are significant at the level you
  requested and we don't handle the no significant factors case
  explicitly. That led me to find another bug in that I don't think
 this
  code to select significant factors ever worked correctly.
  
  I'll need to speak to Jari about this as he wrote the code, but he
 is
  away at the moment. I'll see about coming up with a fix for this.
  
  In the meantime, I presume this will work for you as the factors are
 not
  significant, just drop them before doing the envfit() call:
  
  facs - sapply(enviromentaldata_file, is.factor)
  env - enviromentaldata_file[, !facs]
  
  ef - envfit(species, env, permutations = 999, na.rm = TRUE)
  
  plot(species, display = sites)
  plot(ef, p.max = 0.1)
  
  HTH and thanks for providing the data. I'll delete it now.
  
  All the best,
  
  Gavin
  
  On Sun, 2012-12-30 at 15:25 +, Laura Martínez Suz wrote:
   Hi Gavin,
   
   
   Many thanks! Here they go, named as in the scripts. 
   
   
   species_matrix is the file with the relative abundance of each
 species
   (394) related to the total number of species in each plot (22). 
   
   
   environmentaldata_file corresponds to data from the 22 plots in
 the
   same order than in the species file.
   
   
   I wanted to see which factors are affecting the dissimilarities in
   community composition among plots and plot the significant ones as
   vectors at p0.1.
   
   
   Sorry if it becomes a silly thing and thanks again for your
 offer..
   and yes, please, delete them! :)
   
   
   Regards,
   
   
   Laura
   
   
Subject: Re: [R] Error in plot.envfit(ef, p.max = 0.1) :
 (subscript)
   logical subscript too long
From: gavin.simp...@ucl.ac.uk
To: lauram...@hotmail.com
CC: r-help@r-project.org
Date: Sun, 30 Dec 2012 13:16:56 +0100

On Sat, 2012-12-29 at 20:40 +, Laura Martínez Suz wrote:
 Hello there,
 I'm trying to plot vectors with p0.1 in a NMDS ordination
 plot
   using p.max. Below the scripts I'm using. I guess I'm missing
   something! could you please give me a hand?

  
 species-metaMDS(species_matrix)ef-envfit(species,environmentaldata_file,permu=999,na.rm=TRUE)efplot(species,
  dis=sites)plot(ef,p.max=0.1)
 Error in plot.envfit(ef, p.max = 0.1) : (subscript) logical
   subscript too long
 Many thanks!
 Laura 

Hi Laura,

Can you send the data to me **off list** so I can run through
 the
   code
under the debugger and see what is causing the error? I can see
   where
the subscripting occurs related to p.max but can't immediately
 see
   under
what circumstances the subscript could ever get longer than the
   object
being subscripted.

I promise to delete the data once I have tracked the problem
 down.

Cheers,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
 %~%~
   %
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk

Re: [R] Error in plot.envfit(ef, p.max = 0.1) : (subscript) logical subscript too long

2012-12-30 Thread Gavin Simpson
On Sat, 2012-12-29 at 20:40 +, Laura Martínez Suz wrote:
 Hello there,
 I'm trying to plot vectors with p0.1 in a NMDS ordination plot using p.max. 
 Below the scripts I'm using. I guess I'm missing something! could you please 
 give me a hand?
 species-metaMDS(species_matrix)ef-envfit(species,environmentaldata_file,permu=999,na.rm=TRUE)efplot(species,
  dis=sites)plot(ef,p.max=0.1)
 Error in plot.envfit(ef, p.max = 0.1) :   (subscript) logical subscript too 
 long
 Many thanks!
 Laura   

Hi Laura,

Can you send the data to me **off list** so I can run through the code
under the debugger and see what is causing the error? I can see where
the subscripting occurs related to p.max but can't immediately see under
what circumstances the subscript could ever get longer than the object
being subscripted.

I promise to delete the data once I have tracked the problem down.

Cheers,

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CCA plot

2012-11-28 Thread Gavin Simpson
On Tue, 2012-11-27 at 15:35 -0500, Maria Kernecker wrote:
 Hi, I have a couple questions about fitting environmental (land use
 factors, plant species presence-absence, and soil variables) constraints to
 my CCA biplot. 1. After successfully plotting species and site scores in my
 CCA, I have been trying to insert the biplot arrows of the environmental
 constraints in my data set using the text() function. When I do that, the
 plot changes completely. Is there some code or a sample script you could
 let me know about?

I'm not sure what you tried but this works for me:

require(vegan)
data(varespec)
data(varechem)
## Common but bad way: use all variables you happen to have in your
## environmental data matrix
vare.cca - cca(varespec ~ ., data = varechem)

## build up plot
plot(vare.cca, display = c(sites,species), scaling = 3)
## add the biplot arrows (though this could be done in the plot() call
text(vare.cca, scaling = 3, display = bp)

  2. I would like to include only the environment
 constraints that are significant at conf=0.95, but am not sure that I can
 do that in a CCA biplot. I was hoping that this way I could make my plot
 less crowded.

One way would be:

sig - anova(vare.cca, by = term)
## get ones with p values less than or equal to some threshold
ind - with(sig, head(`Pr(F)` = 0.05, -1))

## now get the bp scores
scrs - scores(vare.cca, display = bp, scaling = 3)
## and take the ones that are signif
scrs - scrs[ind, ]

## draw plot and add them
plot(vare.cca, display = c(sites,species), scaling = 3)
## scale then as per vegan
mul - vegan:::ordiArrowMul(scrs)
arrows(0, 0, scrs[,1] * mul, scrs[,2] * mul, length = 0.05,
   col = blue)
text(scrs * mul * 1.1, labels = rownames(scrs), cex = 0.9,
 col = blue)

Not that I recommend this pruning a CCA...

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CCA with Vegan - Plot problem

2012-11-09 Thread Gavin Simpson
On Fri, 2012-11-09 at 00:51 -0800, rumble14 wrote:
 Hi,
 I've just started using R and am having some problems with CCA using vegan.
 
 I'm looking at abundance p/m2 (hence decimals) vs environmental variables
 and have been using
 http://ecology.msu.montana.edu/labdsv/R/labs/lab12/lab12.html to guide me
 through.
 
 My organism data looks like this:
 Sample  Species_1   Species_2   Species_3   etc
 Sample_1   0   0.12432  0.3456
 etc
 
 and is in a .csv file
 
 My environment data (.csv again) looks like this:
 TempWater   etc
 1833.3
 etc
 
 I can get vegan to run the CCA with no problems but when I enter
 
 cca1.plot - plot(cca.1,choices=c(1,2))
 
 I get the following error message:
 
 Error in function (formula, data = NULL, subset = NULL, na.action = na.fail, 
 : 
   invalid type (list) for variable 'org'
 
 ('org' is the name of my organism file once attached).

Somewhat frustratingly you have covered everything *but* how you created
`cca.1` and whilst you might presume that because an error was not
thrown everything worked out OK with that procedure, the failing
`plot()` call suggests otherwise.

Can you show the exact code you used please? Also, not sure why you are
attaching anything - it is not required for most R usage and certainly
not for using cca() in vegan.

G

 I'd appreciate some help. If I haven't given enough detail, please let me
 know. I've tested the example data given on the website and it works so am
 obviously missing something I shouldn't be.
 
 Cheers,
 Heather
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/CCA-with-Vegan-Plot-problem-tp4649012.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CCA with Vegan - Plot problem

2012-11-09 Thread Gavin Simpson
On Fri, 2012-11-09 at 05:16 -0800, rumble14 wrote:
 Thanks for replying. Apologies for not putting the code, I have not deviated
 from that used on the website I linked to.
 
 Can you show the exact code you used please? 
 
 I am using:
 #below is Species_1 etc
 org - read.table (Organisms.S, header=TRUE, row.names=1)
 #below is environmental variables
 env-read.table (Environment.S, header=TRUE)
 attach(env)
 library(vegan)
 cca.1-(org~Temp+Water+etc)
 cca.1

Don't do that - it is an abuse of the formula interface and an
abomination to attach such objects to the search path.

This isn't actually doing anything (well, it is, but it is not a CCA):

cca.1-(org~Temp+Water+etc) 

There is no function call. And your use of `+ etc` suggests this isn't
the **actual** code you ran and I did ask for that. You should use:

cca.1 - cca(org ~ ., data = env)

assuming you want all variables in `env` in the model - not that that is
a good thing but that is a statistical issue not an R one!

Can you show me - probably *off list* including the actual data files
(which I promise to delete after we solve the problem) - the code you
use and I can take a look but before you send anything to me in an email
make sure you can paste the exact same code into R and have it run

G

 At this point I have noticed (this morning) that I should be getting the CCA
 come up here, and this is working with test data but not my actual data
 which is formatted in exactly the same way.
 If I type:
 
 cca(formula = org ~ Temp + Water+etc, data = org)
 
 at this point, I get what looks like but (you're right) might not be an
 output.
 For the graph I was then trying to use:
 
 cca1.plot - plot(cca.1,choices=c(1,2))
 
 at which point I get error messages.
 
 
 Also, not sure why you are 
 attaching anything 
 
 As I said, I'm quite new to R. It does so on the tutorial linked so  had
 assumed it was to do with being able to access the parameters Temp etc.
 
 H
 
 
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/CCA-with-Vegan-Plot-problem-tp4649012p4649045.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] labeling loading vectors in vegan

2012-07-31 Thread Gavin Simpson
On Fri, 2012-07-27 at 11:52 -0700, Gordon Holtgrieve wrote:
 Hello,
 
 I am using vegan to do an NMDS plot and I would like to suppress the labels
 for the loading vectors. Is this possible?  Alternatively, how can I avoid
 overlap?

Hi Gordon,

You seem to be trying to fit the species scores as a weighted sum a la
PCA. Usually we add species scores as weighted averages of the site
scores in NMDS. metaMDS() handles the latter for you:

require(vegan)
data(varespec)
data(varechem)

mod - metaMDS(varespec)
plot(mod)

and you can get the species scores directly via:

scrs - scores(mod, display = species)

 head(scrs)
  NMDS1   NMDS2
Cal.vul -0.16696628 -0.07418638
Emp.nig  0.05870236  0.10673232
Led.pal  0.88640043 -0.10135959
Vac.myr  0.71139281 -0.10964309
Vac.vit  0.04389918  0.09996126
Pin.syl -0.02588978  0.2963255

There isn't a nice way to fiddle with the plotting of envfit objects
that will stop overlap as most of the potential functions don't work
with envfit objects.

This would be easier if you worked with the WA species scores as we
could do

plot(mod)
ordipointlabel(mod, display = species, add = TRUE,
   col = forestgreen)

where the latter function will do its best to avoid overlap.

Making envfit objects work in the same way will require a good amount
of rewriting various functions which is not something I will have time
to do for a few months, but I will add it to the TODO list and discuss
with Jari about how to proceed.

Feel free to contact me off-list if you require further help with this.

All the best,

Gavin


 Many thanks for the help.
 
 
 Example code:
 #perform NMDS using metaMDS() function
 spe.nmds-metaMDS(data, distance='bray',k=2 , engine = isoMDS,
 autotransform=F, trymax=1000)
 
  #calculate the loading (i.e., variable weights) on each NMDS axis
 vec.sp-envfit(spe.nmds$points, data, perm=1000, choices=c(1:2))
 
 #plot data in ordination
 ordiplot(spe.nmds, choices=c(1:2), type='text', display=sites, xlab=Axis
 1, ylab=Axis 2)
 
 #plot loading vectors
 plot(vec.sp, p.max=0.05, col='blue')
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reception of (Vegan) envfit analysis by manuscript reviewers

2012-05-10 Thread Gavin Simpson
On Wed, 2012-05-09 at 15:51 -0600, Matt Bakker wrote:
 I'm getting lots of grief from reviewers about figures generated with
 the envfit function in the Vegan package. Has anyone else struggled to
 effectively explain this analysis? If so, can you share any helpful
 tips?
 
 The most recent comment I've gotten back: What this shows is which
 NMDS axis separates the communities, not the relationship between the
 edaphic factor and the Bray-Curtis distance.

Without further context for that quote and your manuscript to see how
you are using the method it is difficult to say whether you are doing
something silly or the reviewer is bone-headed.

I've had similar comments from reviewers about my use of the ordisurf()
function. In each case it was the reviewers' failure to understand the
methods applied that was the cause of the confusion.

As you provide little or no context I'll explain what envfit() does etc.

The idea goes back a long way (!) and is in my 1995 edition of Jongman
et al Data Analysis in Community and Landscape Ecology (Cambridge
University Press) though most likely was in 1987 version too. See
Section 5.4 of the Ordination chapter by Ter Braak in that book.

The idea is to find the direction (in the k-dimensional ordination
space) that has maximal correlation with an external variable.
Essentially, we have:

E(z_j) = b_0 + b_1x_1 + b_2x_2

where E(z_j) is the expectation (or mean, or fitted values) of the jth
external (environmental) variable, x_1 and x_2 are the axis scores in
ordination dimensions 1 and 2, and b_y are unknown regression
coefficients. This generalises to more than 2 dimensions or axes.

The biplot arrow drawn goes from (0,0) to (b_1, b_2).

You can see that the aim is to model or predict the values of the jth
environmental variable (z_j) as a linear combination of the axis or
site scores of the samples in the ordination space. Exactly the same
idea underlies the ordisurf() function except that we use a GAM and for
the right hand side of the equation multivariate splines are used which
allow a non-linear surface instead of a plane.

When applied to nMDS, if the nMDS provides a reasonable approximation to
the original dissimilarities, then envfit() will estimate and show the
strengths of the correlation and direction of maximal correlation
between the nMDS configuration and the jth enviromental variable. This
technique can be used to indicate if one or more environmental variables
are associated with differences between sites/samples as represented in
the nMDS ordination.

The big caveat is the implication that the correlation or relationship
between z_j and the ordination space is linear. ordisurf() allows you to
relax this assumption as we fit a potentially non-linear surface to the
ordination space instead of the plane that envfit() effectively produces
(though we show only the direction of change with the arrow).

So without seeing your manuscript or more context (and I'm not promising
to read it or comment more if you provide it) I would suggest that, *if*
you have applied nMDS and used envfit() correctly the combined analysis
*does* reflect the *linear* relationship between the edaphic factor and
the Bray-Curtis distance, assuming of course that the nMDS has low
stress (i.e fits the original dissimilarities well).

In future, you should consider posting similar questions
(ecological/environmental) to the R-SIG-Ecology list instead of the main
R-Help list. I know Jari (lead developer of vegan and author of
envfit() ) has stopped regularly reading the main R-Help list and you
will get far more eyes familiar with these techniques on the
R-SIG-Ecology list.

I have taken the liberty of cc'ing this to the R-Sig-Ecology list so
others can comment.

HTH

G

 Thanks for any suggestions!
 
 
 Matt
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conducting GAM-GEE within gamm4?

2012-05-08 Thread Gavin Simpson
) =  0.317glmer.ML score =  10504  Scale est. = 1 n = 10792
  summary(b$mer)
 Generalized linear mixed model fit by the Laplace approximation
AIC   BIC logLik deviance
  10514 10551  -525210504
 Random effects:
  Groups Name Variance Std.Dev.
  Xr s(dist_slag) 1611344  1269.39
  Xr.0   s(Depth)   98622   314.04
 Number of obs: 10792, groups: Xr, 8; Xr.0, 8
 
 Fixed effects:
  Estimate Std. Error z value Pr(|z|)
 X(Intercept)  -13.968  5.145  -2.715  0.00663 **
 Xs(dist_slag)Fx1  -35.871 33.944  -1.057  0.29063
 Xs(Depth)Fx13.971  3.740   1.062  0.28823
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 Correlation of Fixed Effects:
 X(Int) X(_)F1
 Xs(dst_s)F1  0.654
 Xs(Dpth)Fx1 -0.030  0.000
 
 
 
 I just want to make sure it is correctly allowing for correlation
 within each value for the block variable. How do I formulate the
 model to say that autocorrelation can exist within each single value
 for block, but assume independence among blocks?
 
 
 On another note, I am also receiving the following warning message
 after model completion for larger models (with many more variables
 than 2):
 
 Warning message:
 In mer_finalize(ans) : false convergence (8)
 
 
 
 
 Thank you for your time, effort, and patience dealing with a novice.
 
 Sincerely,
 
 Nathan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clustering analysis with ordination plots

2012-05-02 Thread Gavin Simpson
Please read the posting guide for future questions.

I presume you mean using the vegan package? If so, then see this blog
post of mine which shows how to do something similar:

http://wp.me/pZRQ9-73

If you post more details and an example I will help further if the blog
post is not sufficient for you to get the solution you want.

G

On Mon, 2012-04-30 at 09:44 -0700, borinot wrote:
 Hello to all, 
 
 I'm new to R so I have a lot of problems with it, but I'll only ask the main
 one. 
 
 I have clustered an environmental matrix with 2 different methods, and I'd
 like to plot them in a PCA and a db-RDA. I mean, I want see these clusters
 in the plots like points of differents colours, together with the rest
 information of the plot, but I don't know how to do this. 
 
 I've checked a lot of bibliography and forums, and I haven't found the
 solution... it can't be so hard! 
 
 Well, thanks in advance! :) 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Clustering-analysis-with-ordination-plots-tp4598695.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Long command in Sweave

2012-04-13 Thread Gavin Simpson
On Fri, 2012-04-13 at 17:46 +0800, Wincent wrote: 
 Dear useRs,
 
 I am writing a vignette for a package, which contains long command like this,
 reduce(Lipset_cs,SURVIVAL,c(GNPCAP, URBANIZA, LITERACY, INDLAB, 
 GOVSTAB),explain=positive,remainder=exclude,case=CASEID)
 It is longer than the width a page and part of it will become missing.
 Currently, I have to manually break the command into multiple lines.
 Is there a better way to handle such issue?

Not that I am aware of.

 It seems that others have raised similar question which seems to
 remain unsolved in a satisfactory fashion.
 
 Thanks for your kind attention in advance.
 

1) use some spacing and format the code over multiple lines

reduce(Lipset_cs, SURVIVAL,
   c(GNPCAP, URBANIZA, LITERACY, INDLAB, GOVSTAB),
   explain=positive, remainder=exclude, case=CASEID)

Isn't that more readable?! Any good R-aware editor should be able to
handle appropriate formatting of the code. I *never* write long lines in
my editor; I always break the code down to fit roughly into a 72 column
editor window.

2) if you want to force Sweave to respect your new formatting, use
argument `keep.source=TRUE` for the code chunk. Or set it document wide
using \SweaveOpts{option1=value1, option2=value2} etc in the preamble
(where optionX is one of the arguments and valueX what you want to set
that argument too.

Thought IIRC, `keep.source=TRUE` is the default now and as such Sweave
will respect your formatting by default now - before it broke lines
where it could.

In short get out of the habit of writing long lines of R code; you'll be
better in the long run laying your code out logically.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Long command in Sweave

2012-04-13 Thread Gavin Simpson
I use Emacs and ESS, with the coding standards in one of the R manuals.
I have to insert the carriage returns where I want them, but Emacs/ESS
indents the code correctly

G

On Fri, 2012-04-13 at 22:17 +0800, Wincent wrote:
 Thanks, Gavin and Duncan.
 
 In that case, what I need is a suitable editor which can break the
 command properly.
 
 All the best
 
 On 13 April 2012 19:33, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:
  On Fri, 2012-04-13 at 17:46 +0800, Wincent wrote:
  Dear useRs,
 
  I am writing a vignette for a package, which contains long command like 
  this,
  reduce(Lipset_cs,SURVIVAL,c(GNPCAP, URBANIZA, LITERACY, INDLAB, 
  GOVSTAB),explain=positive,remainder=exclude,case=CASEID)
  It is longer than the width a page and part of it will become missing.
  Currently, I have to manually break the command into multiple lines.
  Is there a better way to handle such issue?
 
  Not that I am aware of.
 
  It seems that others have raised similar question which seems to
  remain unsolved in a satisfactory fashion.
 
  Thanks for your kind attention in advance.
 
 
  1) use some spacing and format the code over multiple lines
 
  reduce(Lipset_cs, SURVIVAL,
c(GNPCAP, URBANIZA, LITERACY, INDLAB, GOVSTAB),
explain=positive, remainder=exclude, case=CASEID)
 
  Isn't that more readable?! Any good R-aware editor should be able to
  handle appropriate formatting of the code. I *never* write long lines in
  my editor; I always break the code down to fit roughly into a 72 column
  editor window.
 
  2) if you want to force Sweave to respect your new formatting, use
  argument `keep.source=TRUE` for the code chunk. Or set it document wide
  using \SweaveOpts{option1=value1, option2=value2} etc in the preamble
  (where optionX is one of the arguments and valueX what you want to set
  that argument too.
 
  Thought IIRC, `keep.source=TRUE` is the default now and as such Sweave
  will respect your formatting by default now - before it broke lines
  where it could.
 
  In short get out of the habit of writing long lines of R code; you'll be
  better in the long run laying your code out logically.
 
  HTH
 
  G
 
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 
 
 
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] customizing plot, Vegan CCA

2012-04-12 Thread Gavin Simpson
On Wed, 2012-03-28 at 01:43 -0700, millerjeremya wrote:
 I’m working on a canonical correspondence analysis using Vegan. 
 I’m new to this. I would like to control the shapes and colors for each of
 my individual display points and can’t figure out the syntax. 

This comes up so often it is /the/ FAQ (well one of them anyway). So
much so I wrote a post on my blog showing how to do this. It changes the
colour of the points but you can use the same ideas to change the
plotting characters (via argument `pch`).

http://ucfagls.wordpress.com/2012/04/11/customising-vegans-ordination-plots/

See if that helps you.

G

 Here is the script I am using. It comes from Numerical Ecology in R by
 Bocard et al. 
 windows(title=CCA biplot - scaling 1, 9, 9)
 plot(spe.cca, scaling=1, display=c(lc, cn), main=Biplot CCA spe ~ env -
 scaling 1)
 
 I think I should be able to add points() and text() lines to change the
 appearance of particular elements. I also will need to refer to one or more
 specific points (e.g., make rows 1-4 a red circle, 5-8 a yellow triangle,
 etc.; make V1 say “tree cover”, V2 say “altitude”, etc.; see attached file
 CCA.jpeg). 
 
 Many thanks!
 
 http://r.789695.n4.nabble.com/file/n4511704/CCA.jpeg 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/customizing-plot-Vegan-CCA-tp4511704p4511704.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gamm and post comparison

2012-02-29 Thread Gavin Simpson
On Wed, 2012-02-22 at 10:55 -0800, RHam wrote:
 My data set consist of number of calls (lcin) across Day. I am looking for
 activity differences between three features (4 sites per feature). I am also
 looking for peaks of activity across time (Day). I am using a gamm since I
 believe these are nonlinear trends with nested data.

You don't need gamm() for this as you don't appear to be using any
random effects. gam() will be fine for the model you show below.

I don't know enough about the workings of mgcv to know whether you can
leverage functions in the multcomp package; gam objects inherit from glm
and lm classes and multcomp can work with these, though do not the point
in ?gamObject that gam objects lack the details of the fitting expected
in glm or lm objects.

IIRC, that model could be fitted as a linear mixed model if the smooth
term is set up correctly. multcomp can work with nlme and lme4 so you
could continue to use gamm() as below but use the multcomp package on
the $lme component. The gamm4 package provides a version of gamm() that
uses lme4 instead of nlme for the underlying fitting. Again you might
look into using the lmer representation of the model with multcomp.

None of the above is tested or even based on personal experience of
doing this.

G

 gammdata-gamm(lcin~Temp+s(Day)+fType+wind+fFeature+Forest+Water+Built,
 list=fSite,data=data, family=gaussian)
 
 summary(gammdata$gam)
 summary(gammdata$lme)
 anova(gammdata$gam)
  
 I can see which variables are significant but I was wondering if there was a
 way to do a post hoc to see differences between features? Or is there a way
 to compare different models (feature) for significance (not best fit)
 against each other?
 
 Thanks,
 Rham  
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Gamm-and-post-comparison-tp4411366p4411366.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] z-transform each column of a data.frame

2012-01-20 Thread Gavin Simpson
On Fri, 2012-01-20 at 18:04 +0100, Martin Batholdy wrote:
 Hi,
 
 
 I am currently trying to z-transform (that is subtracting the mean and divide 
 by the standard deviation) multiple columns of a data.frame at the same time.
 
 
 My first approach was:
 
 x - data.frame(c(0:10), c(10:20))
 (x - colMeans(x)) / apply(x, 2, sd)
 
 
 This is obviously not working.
 
 Is there a convenient way to z-transform each column separately (so in this 
 case, each column represents an independent variable that should be 
 z-transformed)

?scale

 scale(x)
 c.0.10.   c.10.20.
 [1,] -1.5075567 -1.5075567
 [2,] -1.2060454 -1.2060454
 [3,] -0.9045340 -0.9045340
 [4,] -0.6030227 -0.6030227
 [5,] -0.3015113 -0.3015113
 [6,]  0.000  0.000
 [7,]  0.3015113  0.3015113
 [8,]  0.6030227  0.6030227
 [9,]  0.9045340  0.9045340
[10,]  1.2060454  1.2060454
[11,]  1.5075567  1.5075567
attr(,scaled:center)
 c.0.10. c.10.20. 
   5   15 
attr(,scaled:scale)
 c.0.10. c.10.20. 
3.316625 3.316625

 
 thanks!
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any DCCA function in R?

2011-12-20 Thread Gavin Simpson
On Tue, 2011-12-20 at 09:31 +0100, Juan Santos wrote:
 Dear members,
 
 I am performing multivariate analysis on marine benthic populations 
 using R. At first glance I found ca and VEGANO packages to be the 

That would be the `vegan` package

 suitable for the task, but neither has incorporated Detrended Canonical 
 Correspondence Analysis (DCCA), which is just the method I want to apply 
 on my data. I've looked for alternative packages containing the method, 
 but my suspicion is that there is not DCCA availability for R users. 
 Does anyone have better news for me?

Nope, sorry. I'm not aware of any such function and as I maintain the
Environmetric Task View on CRAN I come across most ecologically-related
packages one way or another.

I can think of two reasons to use a DCCA; i) to estimate a gradient
length in environmental space on axis 1, and ii) to remove an arch in a
CCA.

ii) can more easily be addressed by reducing the number of terms in the
CCA model or a different ordination method, and i) could no doubt be
done in a number of other ways should you really want it.

As far as I am aware DCCA was only ever implemented in CANOCO (possibly
also in one of the other DOS applications from the good old days of
quantitative ecology). Jari ported the original DECORANA code for DCA to
vegan as it was available for us to do so. I am not aware of a
free/unencumbered source code for DCCA, and given that and the limited
number of applications for the method, it hasn't been something we have
been motivated to do.

What do you want to do with DCCA? Perhaps people on the list can suggest
alternatives? You might also consider posting that message (what you
want to do with DCCA) on the R-SIG-Ecology list as there'll be more
people there who will be familiar with you specific area of interest.

HTH

G

 
 Javier Murillo
 
 PHD Candidate
 IEO-CO VIGO
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vegan: Diversity Plot, label points

2011-11-29 Thread Gavin Simpson
On Wed, 2011-11-23 at 16:02 -0300, Alejo C.S. wrote:
 Dear List,
 
 I can'f figure how to add point labels in the next plot (example from
 ?taxondive help page):
 
 library(vegan)
 data(dune)
 data(dune.taxon)
 taxdis - taxa2dist(dune.taxon, varstep=TRUE)
 mod - taxondive(dune, taxdis)
 plot(mod)
 
 The points in this plot are diversity values of single sites, and I'd
 like to add a label to each one. The plot command don't accept a
 label argument.
 Any tip?

A couple of options:

with(mod, text(Species, Dplus, label = rownames(dune), 
   pos = 2, cex = 0.7))

fiddle with pos - see ?text for details.

Another option is

with(mod, identify(Species, Dplus, label = rownames(dune), cex = 0.9))

where you now click around the points you want to label. See ?identify
for further details.

HTH

G

 Thanks in advance.
 
 Alejo
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I cannot get species scores to plot with site scores in MDS when I use a distance matrix as input. Problems with NA's?

2011-11-29 Thread Gavin Simpson
On Thu, 2011-11-24 at 07:57 -0800, B77S wrote:
 Try the daisy() function from the package cluster, it seems to be able to
 handle NAs and non-dummy coded character variables
 
 metaMDS(daisy(df, metric=gower))

That won't help the OP as the species scores (the species data, i.e. the
traits in this case) can not be computed from a site x site
dissimilarity matrix. This is has the same problem as the OPs existing
problem.

G

 
 
 
 Edwin Lebrija Trejos wrote
  
  Hi, First I should note I am relatively new to R so I would appreciate
  answers that take this into account.
   
  I am trying to perform an MDS ordination using the function “metaMDS” of
  the “vegan” package. I want to ordinate species according to a set of
  functional traits. “Species” here refers to “sites” in traditional
  vegetation analyses while “traits” here correspond to “species” in such
  analyses.  
   
  My data looks like this:
   
   Trait1   Trait2 Trait3  Trait4  Trait5  Trait…  
  Species1 228.44   16.56   1.66   13.22 1 short 
  Species2 150.55   28.07   0.41   0.60  1 mid
  Species3 NA   25.89 NA   0.55  0 large
  Species4 147.70   17.65   0.42   1.12 NA large
  Species… 132.68  NA   1.28   2.75  0 short
  
   
  Because the traits have different variable types, different measurement
  scales, and also missing values for some species, I have calculated the
  matrix of species distances using the Gower coefficient of similarity
  available in Package “FD” (which allows missing values). 
  My problem comes when I create a bi-plot of species and traits. As I have
  used a distance matrix in function “metaMDS” there are no species scores
  available. This is given as a warning in R: 
   
   NMDSgowdis- metaMDS(SpeciesGowdis)
  plot(NMDSgowdis, type = t)
  Warning message:In ordiplot(x, choices = choices, type = type, display =
  display, :Species scores not available” 
   
  I have read from internet resources that in principle I could obtain the
  trait (species) scores to plot them in the ordination but my attempts
  have been unsuccessful. I have tried using the function “wascores” in
  package vegan and “add.spec.scores” in package BiodiversityR. For this
  purpuse I have created a new species x traits table where factor traits
  were coded into dummy variables and all integer variables (including
  binary) were coerced to numeric variables. Here are the codes used and the
  error messages I have got: 
   
  “ NMDSgowdis- metaMDS(SpeciesGowdis)
  NMDSpoints-postMDS(NMDSgowdis$points,SpeciesGowdis)
  NMDSwasc-wascores(NMDSpoints,TraitsNMDSdummies)
  Error in if (any(w  0) || sum(w) == 0) stop(weights must be non-negative
  and not all zero) : missing value where TRUE/FALSE needed” 
   
  I imagine the problem is with the NA’s in the data. 
  Alternatively, I have used the “add.spec.scores” function,
  method=”cor.scores”, found in package BiodiversityR. This seemed to work,
  as I got no error message, but the species scores were not returned. Here
  the R code and results:
  “
  A-add.spec.scores(ordi=NMDSgowdis,comm=TraitsNMDSdummies,method=cor.scores,multi=1,
  Rscale=F,scaling=1)
  plot(A)
  Warning message:In ordiplot(x, choices = choices, type = type, display =
  display, :Species scores not available“
   
  Can anyone guide me to get the trait (“species”) scores to plot together
  with my species (“site”) scores?
  Thanks in advance,
  Edwin
  

  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/I-cannot-get-species-scores-to-plot-with-site-scores-in-MDS-when-I-use-a-distance-matrix-as-input-Pr-tp4103699p4104406.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

Re: [R] I cannot get species scores to plot with site scores in MDS when I use a distance matrix as input. Problems with NA's?

2011-11-29 Thread Gavin Simpson
On Thu, 2011-11-24 at 12:16 +, Edwin Lebrija Trejos wrote:
 Hi, First I should note I am relatively new to R so I would appreciate
 answers that take this into account.
  
 I am trying to perform an MDS ordination using the function “metaMDS”
 of the “vegan” package. I want to ordinate species according to a set
 of functional traits. “Species” here refers to “sites” in traditional
 vegetation analyses while “traits” here correspond to “species” in
 such analyses.  
  
 My data looks like this:
  
  Trait1   Trait2 Trait3  Trait4  Trait5  Trait…  
 Species1 228.44   16.56   1.66   13.22 1 short 
 Species2 150.55   28.07   0.41   0.60  1 mid
 Species3 NA   25.89 NA   0.55  0 large
 Species4 147.70   17.65   0.42   1.12 NA large
 Species… 132.68  NA   1.28   2.75  0 short
 
snip /

 Can anyone guide me to get the trait (“species”) scores to plot
 together with my species (“site”) scores?
 Thanks in advance,
 Edwin

I think you should pass metaMDS the species trait matrix and then get it
to use FD to compute the dissimilarities. Note from ?metaMDS there is a
distfun argument for metaMDSdit. metaMDS passes the community data to
metaMDSdist to compute the dissimilarities.

The only snag is that gowdis doesn't accept a `method` argument so we
need a wrapper:

wrapper - function(x, method, ...) {
gowdis(x, ...)
}

then do

metaMDS(SpeciesGowdis, distfun = wrapper, X)

where  represents any other arguments you want to pass to gowdis via
our wrapper. Essentially the wrapper ignores the method argument, we
just need it as metaMDSdist wants to call the dissimilarity function
with a method argument.

This is not tested as there wasn't reproducible code, but hopefully
you'll be able to work it out from the above.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cca with repeated measures

2011-11-18 Thread Gavin Simpson
On Fri, 2011-11-18 at 10:25 +0100, René Mayer wrote:
 Dear all,
 How can I run a constrained correspondence analysis with
 the following data:
 15 animals were measured repeatedly month-wise (over to 2 years)  
 according to ther diet composition (8 food categories).
 
 our data.frame looks like this:
 
 food 1  2 ... 8  sex season year animal
 freq 12 8 ... 1  0   summer 2011 1
 freq 0  7 ... 0  1   winter 2011 1
 ...
 freq 0  7 ... 0  1   spring 2011 15
 
 We want to find out if season and sex influences diet composition.
 My experience with CCA is limited, but in repeated measures ANOVA,  
 e.g. with aov()
 on has to define the between (animal) error term in order to deal with  
 the pseudoreplication.
 Do I have to restructure or reshape the data in order to deal with  
 pseudoreplication
 the data? Or do I have to define an error strata?
 I suspect I cannot simply run:
 
 library(vegan)
 model=cca(food ~ season*sex+year+animal, data)

You could do that although the analysis would be i) focussed on those
particular animals in those years, and ii) you could only test the terms
season, sex and season:sex in a sequential manner (i.e. dependent upon
how the terms enter the model), so season, then sex after season is in
the model, then their interaction after both main terms are included in
the model.

ii) is done by adding `by = terms` to the call to the `anova method
for cca objects; examples are in `?anova.cca`

That corresponds to a fixed effects formulation of the ANOVA (assuming I
have my terminology right). The alternative is to adjust the permutation
scheme used to reflect the clustering in your data. In that case, using
`strata = animal` would be OK. Ideally one would want to control for
temporal dependence so you would want cyclic shifts *within* `strata =
animal` but vegan can not yet do this sort of permutation. It is coming;
the actual code to generate those permutations is available in the
permute package (upon which vegan depends), but as yet we have not
hooked this into the vegan ordination functions (it is on the TODO
list). That said, season should be accounting for much of the temporal
dependence, so I think you might get away with just specifying strata
(the permutation test itself is permuting residuals from the model so if
the model is capturing the seasonal variation then the simple permuting
within animal should be OK, but you can check by extracting the
residuals using the resid() method and plotting them out by time and
animal).

HTH

G

 I would be grateful for any help.
 
 Thanks in advance,
 René
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cca with repeated measures

2011-11-18 Thread Gavin Simpson
On Fri, 2011-11-18 at 15:17 +0100, René Mayer wrote:
 Thanks a lot Gavin!,
 this was what I was looking for.
 Have I got this right that with no 'cyclic shifts *within* strata' you  
 mean that I cannot define a nesting within animal, e.g.,  
 animal/year/season (speaking in regression-terms  random-effects for  
 the animal-specific season and year variation).

In the permutation framework, you condition the permutation on animal
(`strata = animal` in the anova() method), but that alone would say that
observations are freely exchangeable within each animal, but not
exchangeable between animals. As you have time series data then the
observations are not really freely exchangeable within animal either. We
can try to maintain the within-animal temporal dependence structure by
using cyclic shift permutations:

 require(permute)
 shuffleSeries(1:10)
 [1]  4  5  6  7  8  9 10  1  2  3
 t(replicate(10, shuffleSeries(1:10)))
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   1012345678 9
 [2,]12345678910
 [3,]3456789   101 2
 [4,]   1012345678 9
 [5,]789   1012345 6
 [6,]89   10123456 7
 [7,]9   101234567 8
 [8,]3456789   101 2
 [9,]456789   1012 3
[10,]56789   10123 4

cyclic shifts basically take a random starting point for the permuted
timeseries and then keep sample in the order they appear in the data,
bending the ends of the series round together into a circle. The above
example:

 shuffleSeries(1:10)
 [1]  4  5  6  7  8  9 10  1  2  3

shows a single cyclic shift of the observations 1:10 in an equally
spaced time series. Here the random starting point was observation 4 and
note that we join the ends of the timeseries so after observation 10, we
have observation 1.

There are problems with this if there is a trend because you'll get a
discontinuity when you join the two ends.

As you only have two years of data, and include season as a fixed effect
(main term in the model), you could assume that the residuals within
animal might be free of temporal dependence. You can check that as I
said by fitting the model and looking at the residuals within animal
over time.

Unless you are prepared to do some coding, the above discussion is moot
as vegan doesn't possess the code to use these restricted permutations
yet. If you want to do it yourself by hacking the anova.ccabyterm()
function, take a look at the vignette that comes with the permute
package for ideas on how to specify the permutation design you want and
then generate the permutations using shuffle() or shuffleSet() instead
of via permuted.index() that is used in vegan.

Basically, with:

mod - cca(food ~ season*sex, data)
anova(mod, by = term, strata = data$animal)

I guess you are allowing for a random intercept in animal only. For
animal specific season and year effects you'll need to include animal
and year in the model with appropriate interactions - we don't, as far
as I know, have ways of dealing with more complex dependencies in the
residuals via permutations built into vegan other than permuting within
animal, or maintaining temporal structure within animal (if you hack the
code yourself using functions from permute).

HTH

G

 thanks,
 René
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutations

2011-11-18 Thread Gavin Simpson
On Fri, 2011-11-18 at 11:03 -0500, Gyanendra Pokharel wrote:
 Hi all,
 why factorial(150) shows the error out of range in 'gammafn'?
 I have to calculate the number of subset formed by 150 samples taking 10 at
 a time. How is this possible?
 best

Do you mean:

 choose(150, 10)
[1] 1.169554e+15

??

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutation within rows of a matrix

2011-11-17 Thread Gavin Simpson
On Wed, 2011-11-16 at 14:55 -0800, Peter Ehlers wrote:
 I must be missing something. What's wrong with
 
   t(apply(mat, 1, sample))
 
 ?

Only missing that I am either[*] i) stupid, ii) being too clever, iii)
down on my coffee intake for the day.

G

[*] delete as applicable any that don't apply. ;-)

 Peter Ehlers
 
 On 2011-11-16 12:12, Gavin Simpson wrote:
  On Wed, 2011-11-16 at 14:29 -0500, R. Michael Weylandt wrote:
  Suppose your matrix is called X.
 
  ? sample
  X[sample(nrow(X)),]
 
  That will shuffle the rows at random, not permute within the rows.
 
  Here is an alternative, first using one of my packages (permute -
  shameful promotion ;-) !:
 
  mat- matrix(sample(0:1, 100, replace = TRUE), ncol = 10)
 
  require(permute)
  perms- shuffleSet(10, nset = 10)
  ## permute mat
  t(sapply(seq_len(nrow(perms)),
function(i, perms, mat) mat[i, perms[i,]],
mat = mat, perms = perms))
 
  If you don't want to use permute, then you can do this via standard R
  functions
 
  perms- t(replicate(nrow(mat), sample(ncol(mat
  ## permute mat
  t(sapply(seq_len(nrow(perms)),
function(i, perms, mat) mat[i, perms[i,]],
mat = mat, perms = perms))
 
  HTH
 
  G
 
  Michael
 
  On Wed, Nov 16, 2011 at 11:45 AM, Juan Antonio Balbuenabalbu...@uv.es  
  wrote:
  Hello
  This is probably a basic question but I am quite new to R.
 
  I need to permute elements within rows of a binary matrix, such as
 
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]000010000 0
[2,]001100011 0
[3,]010000100 0
[4,]000000110 0
[5,]000100001 0
[6,]001100000 1
[7,]000000000 0
[8,]110100010 1
[9,]100101010 0
  [10,]000000010 1
 
 
  That is, elements within each row are permuted freely and independently 
  from
  the other rows.
 
  I see that is is workable by creating a array for each row, performing
  sample and binding the arrays again, but I wonder whether there is a more
  efficient way of doing the trick.
 
  Any help will be much appreciated.
 
 
 
  --
  View this message in context: 
  http://r.789695.n4.nabble.com/permutation-within-rows-of-a-matrix-tp4076989p4076989.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutation within rows of a matrix

2011-11-16 Thread Gavin Simpson
On Wed, 2011-11-16 at 14:29 -0500, R. Michael Weylandt wrote:
 Suppose your matrix is called X.
 
 ? sample
 X[sample(nrow(X)),]

That will shuffle the rows at random, not permute within the rows.

Here is an alternative, first using one of my packages (permute -
shameful promotion ;-) !:

mat - matrix(sample(0:1, 100, replace = TRUE), ncol = 10)

require(permute)
perms - shuffleSet(10, nset = 10)
## permute mat
t(sapply(seq_len(nrow(perms)), 
 function(i, perms, mat) mat[i, perms[i,]],
 mat = mat, perms = perms))

If you don't want to use permute, then you can do this via standard R
functions

perms - t(replicate(nrow(mat), sample(ncol(mat
## permute mat
t(sapply(seq_len(nrow(perms)), 
 function(i, perms, mat) mat[i, perms[i,]],
 mat = mat, perms = perms))

HTH

G

 Michael
 
 On Wed, Nov 16, 2011 at 11:45 AM, Juan Antonio Balbuena balbu...@uv.es 
 wrote:
  Hello
  This is probably a basic question but I am quite new to R.
 
  I need to permute elements within rows of a binary matrix, such as
 
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
   [1,]000010000 0
   [2,]001100011 0
   [3,]010000100 0
   [4,]000000110 0
   [5,]000100001 0
   [6,]001100000 1
   [7,]000000000 0
   [8,]110100010 1
   [9,]100101010 0
  [10,]000000010 1
 
 
  That is, elements within each row are permuted freely and independently from
  the other rows.
 
  I see that is is workable by creating a array for each row, performing
  sample and binding the arrays again, but I wonder whether there is a more
  efficient way of doing the trick.
 
  Any help will be much appreciated.
 
 
 
  --
  View this message in context: 
  http://r.789695.n4.nabble.com/permutation-within-rows-of-a-matrix-tp4076989p4076989.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordination in vegan: what does downweight() do?

2011-11-08 Thread Gavin Simpson
On Mon, 2011-11-07 at 10:24 -0800, kelsmith wrote: 
 Can anyone point me in the right direction of figuring out what downweight()
 is doing?
  
 I am using vegan to perform CCA on diatom assemblage data. I have a lot of
 rare species, so I want to reduce the influence of rare species in my CCA. I
 have read that some authors reduce rare species by only including species
 with an abundance of at least 1% in at least one sample (other authors use
 5% as a rule, but this removes at least half my species).

That is not what downweight() is for. If you want this sort of
selection, see chooseTaxa() in my analogue package

diat.sel - chooseTaxa(diatoms, max.abun = 1, type = AND)

or, if proportions not percent

diat.sel - chooseTaxa(diatoms, max.abun = 0.01, type = AND)

This sort of indexing is trivial (but I made a typo in the current
version so type = OR won't work) so you can study the code of
analogue:::chooseTaxa.default once I fix the CRAN version or on R-Forge
now:

https://r-forge.r-project.org/scm/viewvc.php/pkg/R/chooseTaxa.R?view=markuproot=analogue

Jari has addressed the other part of your question.

Jari also mentioned the issue about whether you should be removing or
downweighting rare species. Many people, especially diatomists, do this
for practical purposes in a routine fashion because their data sets are
especially speciose and have a large proportion of low abundance taxa.
As a general matter of routine practice, I don't think this is a very
good way of working, especially as we have no good ecological grounds
for doing so and who knows what information these species could be
telling us if we just listened to them instead of discarding them.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pdIdent in smoothing regression model

2011-10-17 Thread Gavin Simpson
On Sun, 2011-10-09 at 10:25 -0400, Lei Liu wrote:
 Hi there,
 
 I am reading the 2004 paper Smoothing with mixed model software in 
 Journal of Statistical Software, by Ngo and Wand. I tried to run 
 their first example in Section 2.1 using R but I had some problems. 
 Here is the code:

I'm not going to try to fix this nor work out how to do it via lme(),
but just wanted to note that you can use mgcv:::gam() for example to fit
something very similar.

require(SemiPar)
require(mgcv)

data(fossil)
mod - gam(strontium.ratio ~ s(age), data = fossil,
   method = REML)

That is treating the smoothness terms as random effects.

Also note that Matt Wand wrote the SemiPar package as support software
for the book Semiparametric Regression, where these example data are
taken from:


 library(nlme)
 fossil - read.table(fossil.dat,header=T)
 x - fossil$age
 y - 10*fossil$strontium.ratio
 knots - seq(94,121,length=25)
 n - length(x)
 X - cbind(rep(1,n),x)
 Z - outer(x,knots,-)
 Z - Z*(Z0)
 fit - lme(y~-1+X,random=pdIdent(~-1+Z))
 
 When I ran the code
 
 fit - lme(y~-1+X,random=pdIdent(~-1+Z))
 
 I got an error message:
 
 Error in getGroups.data.frame(dataMix, groups) :   Invalid formula for groups
 
 I was really puzzled. I asked Dr. Ngo and he said they did it in 
 S-plus but not R. Does anyone knows how to do it in R? Thanks!
 
 Lei Liu
 Associate Professor
 Division of Biostatistics
 Department of Public Health Sciences
 University of Virginia School of Medicine
 
 http://people.virginia.edu/~ll9f/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pdIdent in smoothing regression model

2011-10-17 Thread Gavin Simpson
On Sun, 2011-10-09 at 10:25 -0400, Lei Liu wrote:
 Hi there,
 
 I am reading the 2004 paper Smoothing with mixed model software in 
 Journal of Statistical Software, by Ngo and Wand. I tried to run 
 their first example in Section 2.1 using R but I had some problems. 
 Here is the code:

[whoops - fingers slipped and sent the message prematurely]

I'm not going to try to fix this nor work out how to do it via lme(),
but just wanted to note that you can use mgcv:::gam() for example to fit
something very similar.

require(SemiPar)
require(mgcv)

data(fossil)
mod - gam(strontium.ratio ~ s(age), data = fossil,
   method = REML)

That is treating the smoothness terms as random effects.

Also note that Matt Wand wrote the SemiPar package as support software
for the book Semiparametric Regression, where these example data are
taken from:

http://cran.r-project.org/web/packages/SemiPar/index.html

You could use the function contained therein.

The above presumes you weren't wanting to make the example work for some
pedagogic reason.

HTH

G

 library(nlme)
 fossil - read.table(fossil.dat,header=T)
 x - fossil$age
 y - 10*fossil$strontium.ratio
 knots - seq(94,121,length=25)
 n - length(x)
 X - cbind(rep(1,n),x)
 Z - outer(x,knots,-)
 Z - Z*(Z0)
 fit - lme(y~-1+X,random=pdIdent(~-1+Z))
 
 When I ran the code
 
 fit - lme(y~-1+X,random=pdIdent(~-1+Z))
 
 I got an error message:
 
 Error in getGroups.data.frame(dataMix, groups) :   Invalid formula for groups
 
 I was really puzzled. I asked Dr. Ngo and he said they did it in 
 S-plus but not R. Does anyone knows how to do it in R? Thanks!
 
 Lei Liu
 Associate Professor
 Division of Biostatistics
 Department of Public Health Sciences
 University of Virginia School of Medicine
 
 http://people.virginia.edu/~ll9f/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting issues with PCA

2011-10-17 Thread Gavin Simpson
On Mon, 2011-10-17 at 21:02 +1000, Andrew Halford wrote:
 Hi Listers,
 
 This has a simple answer but it has been eluding me nonetheless.

ordiellipse() doesn't work the way you think it does - it can only take
a single colour at a time. Therefore you need to do /n/ calls to
ordiellipse() to draw /n/ ellipses.

Here is a *reproducible* example, taken from ?ordiellipse:

require(vegan)
data(dune)
data(dune.env)
mod - cca(dune ~ Management, dune.env)
plot(mod, type=n, display = sites)
text(mod, display=sites, labels = as.character(Management))

## vector of colours
cols - c(blue,red,darkgreen,grey70)

## add ellipses
with(dune.env, ordiellipse(mod, Management, kind=se, conf=0.95,
   lwd=2, col=cols[1], show.groups = BF))
with(dune.env, ordiellipse(mod, Management, kind=se, conf=0.95,
   lwd=2, col=cols[2], show.groups = HF))
with(dune.env, ordiellipse(mod, Management, kind=se, conf=0.95,
   lwd=2, col=cols[3], show.groups = NM))
with(dune.env, ordiellipse(mod, Management, kind=se, conf=0.95,
   lwd=2, col=cols[4], show.groups = SF))

We could automate this a bit:

## set up plotting region
plot(mod, type=n, display = sites)
text(mod, display=sites, labels = as.character(Management))

## get the levels of the factor for plotting groups
lev - with(dune.env, levels(Management))
## vector of colours
cols - c(blue,red,darkgreen,grey70)

## loop to draw each group
for (i in seq_along(lev)) {
with(dune.env, ordiellipse(mod, Management, kind=se, conf=0.95,
   lwd=2,
   col=cols[i], ## ith colour
   show.groups = lev[i])) ## for ith group
}

This works with rda() too.

HTH

G

 I have been building a PCA plot from scratch with the ability to plot
 predefined groups in different colors. This has worked fine but when I try
 to get a polygon drawn around each of the groups it is not recognising my
 colour file correctly and is only printing the first colour in the
 filecode is below
 
  site.codings -
 c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,4,4,4,4,4,3,3,3,6,6,6,6,6,6,6,6,5,5,5,5)
  names(site.codings) - c( WM1, WM2, WM3, NM1, NM2, NM3, KH1,
 KH2, KH3, LM1 ,LM2 ,LM3, DB1 ,DB2 ,DB3, DM1 , DM2 ,
 DM3 , FI1,  FI2,  BKI1, BKI2, BKO1, BKO2, BKO3,
 SUR1,MI1,MI2,MI3,BHE1,BHE2,BHE3,BHW1,BHW2,BHW3,HAL1,HAL2,HAL3,HAL4,HAL5,HAL6,HAL7,DOH1,DOH2,DOH3,DOH4,DOH5)
  fish.pca -rda(fish.sqrt.h)
  fish.site - scores(fish.pca,display=sites,scaling=3)
  fish.spp -
 scores(fish.pca,display=species,scaling=3)[omanfish.mrt.indval$pval=0.05,]
  graph - plot(fish.pca,display=c(sites,species),type=n,scaling=3)
  plotcolor -
 c(red,green,blue,aquamarine,magenta,yellow)[site.codings]
  points(fish.site,pch=21,bg=plotcolor,cex=1.2)
 
 #up to this point all works well but when I try to draw the polygons I cant
 get the lines to colour code the same way as the points did
 
  ordiellipse(graph,site.codings,kind=sd,conf=0.90,draw=polygon)
 
 I see there is a command called show.groups but I cant work out how to use
 it to access the plotcolor file.
 
 Any help appreciated.
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binomial GLM quasi separation

2011-10-17 Thread Gavin Simpson
On Sat, 2011-10-15 at 09:11 -0700, lincoln wrote:
 #Uwe:
snip /
 #Gavin:
 
 I have read carefully your thread but I am not sure to understand what you
 are suggesting (my gaps in statistics!). You say that it should be due to
 the /Hauck Donner/ effect and that it is not a quasi separation or
 separation issue. Even though, I am still unsure to understand why I found
 such a high asymptotic standard error.

I don't believe this is a separation issue - the sorts of things we'd
expect to see if this were due to separation do not show up.

Given the large estimate for the coefficient for the term it is not that
surprising that the associated uncertainty is also high:

 set.seed(2)
 var(runif(100))
[1] 0.08911998
 set.seed(2)
 var(runif(100)*1)
[1] 8911998

All I did there was increase the units of the data in the second
example and the variance is huge, but only because the data were
expressed in units 1 times bigger than the first example. In the
same way, the coefficient estimate is large so it's standard error is
also large; the question one needs to ask is, is the estimate of the
coefficient for hcp bounded away from zero, given the uncertainty in the
estimate.

If you were to produce a profile confidence interval it too would be
large.

So you have a large estimate, which is somewhat uncertain. Given that
the slope of the log likelihood is low at the estimate and quite
different from the slope at \beta == 0, it is not unreasonable to assume
that the Hauck Donner effect might be present...

 Anyway, how should I consider this result? Should I find another way to
 analyze this process or I could consider this as correct?

...however, in the case of the snippet of data you showed, it doesn't
affect the result - on the basis of the Wald test you would still accept
that hcp is significant/important. The Hauck Donner effect might be
leading to a lower value of the test statistic, but it hasn't affected
the outcome of the test.

To check, fit the model with and without hcp and then use the anova()
function to compare the two models. This will do a likelihood ratio
test.

 If I am understanding this enough, this warning message

Possibly, but it could just be that the fitted probabilities really are
0 or 1.

  and the very high
 estimates should be due to  /Hauck-Donner/. Regarding that reference to
 Venables and Ripley (2002) on this issue, I have found this ( 
 http://kups.ku.edu/maillist/classes/ps707/2005/msg00023.html Hauck-Donner  )
 where it is said that The practical advice, then, is to run the model with
 all of the variables, and then run again with the questionable one removed,
 and conduct a likelihood ratio test./ and I suppose that the p-values for
 hcp should be the LRT p-value, isn't it?

Yes. Well it is the result of applying a likelihood ratio test. I don't
think there is such a thing as *the* p-value for a term in a model, just
different ways of computing *a* p-value.

In this case, what does it matter? If the Wald test is *under*estimating
z but the term *is* still significant, the LRT should only confirm this
and give an even lower p-value than the already very low one.

 Thanks for taking your time to help me in this.

Would it hurt you to reply via an email? Regardless of what Nabble
thinks, R-help is a mailing list and your *posts* keep on removing all
the context - I have to keep on hunting for the thread in the archives
just to keep track of what you have told us.

G

 Simone
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/binomial-GLM-quasi-separation-tp3901687p3907716.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What does \Sexpr[results=rd]{} exactly mean in Rd?

2011-10-17 Thread Gavin Simpson
On Sun, 2011-10-16 at 19:36 -0500, Yihui Xie wrote:
 Hi,
 
 I have spent a few hours on the R-exts manual and the documentation of
 parse_Rd() (as well as the PDF document in the references), but I
 still have not figured out what results=rd means. I thought I could
 use an R code fragment to create an Rd fragment dynamically. Here is
 an example, in which I was expected the output to be a describe list
 DL in HTML, but it turns out not to be true.

Perhaps best not to cross post to several internet resources at once. I
replied to the same Q on StackOverflow:

http://stackoverflow.com/q/7788628/429846

Suffice it to say that your example works for me with 2.13.1 (still need
to compile 2.13.2 on my workstation). I left some additional comments
and examples, which might help understand this. I had trouble when I
first started playing this and didn't pursue further, but I think I am
starting to understand how to use this now after taking a look when I
tried to answer your Q.

G

 (I was actually building a package with Rd's containing \Sexpr{}
 instead of really using Rd2HTML(); the content was not rendered after
 I run R CMD build.)
 
 des - \\describe{\\item{def}{ghi}}
 con - textConnection(c(\\title{abc}\\name{abc},
 \\details{\\Sexpr[results=rd,stage=build]{des}}))
 z - parse_Rd(con)
 Rd2HTML(z, stages = build)
 close(con)
 
 !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
 htmlheadtitleR: abc/title
 meta http-equiv=Content-Type content=text/html; charset=utf-8
 link rel=stylesheet type=text/css href=R.css
 /headbody
 
 table width=100% summary=page for abctrtdabc/tdtd
 align=rightR Documentation/td/tr/table
 
 h2abc/h2
 
 h3Details/h3
 
 pdefghi/p
 
 
 /body/html
 
 
  sessionInfo()
 R version 2.13.2 (2011-09-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] tools stats graphics  grDevices utils datasets  methods
 [8] base
 
 other attached packages:
 [1] devtools_0.4
 
 loaded via a namespace (and not attached):
 [1] RCurl_1.6-10
 
 
 Thanks!
 
 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: 515-294-2465 Web: http://yihui.name
 Department of Statistics, Iowa State University
 2215 Snedecor Hall, Ames, IA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] binomial GLM quasi separation

2011-10-14 Thread Gavin Simpson
(sex~twp+hwp+hcp+hnp,binomial,data=dati)
 Mensajes de aviso perdidos
 glm.fit: fitted probabilities numerically 0 or 1 occurred 
 pp-profileModel(glm.sat,quantile=qchisq(0.95,1),objective=ordinaryDeviance,which=4)
 Preliminary iteration . Done
 
 Profiling for parameter hcp ... Done
 pp1-profileModel(glm.sat,quantile=qchisq(0.95,1),objective=ordinaryDeviance,which=4,stepsize=20)
 Preliminary iteration . Done
 
 Profiling for parameter hcp ... Done
 pp2-profileModel(glm.sat,quantile=qchisq(0.95,1),objective=ordinaryDeviance,which=4,stepsize=100)
 Preliminary iteration . Done
 
 Profiling for parameter hcp ... Done
 plot(pp)
 mtext(Default stepsize,adj=0,cex=2,line=1)
 dev.new()
 plot(pp)
 mtext(Stepsize=20,adj=0,cex=2,line=1)
 dev.new()
 plot(pp)
 mtext(Stepsize=100,adj=0,cex=2,line=1)
 
 And these are the plots as they look like:
 http://r.789695.n4.nabble.com/file/n3904261/plot1.png 
 http://r.789695.n4.nabble.com/file/n3904261/plot2.png 
 http://r.789695.n4.nabble.com/file/n3904261/plot3.png 
 
 I have tried to understand what is going on but I don't know how to
 interpret this.
 It's quite a long time that I am trying to solve this but I have not been
 able to. Here (  http://r.789695.n4.nabble.com/file/n3904261/simone.txt
 simone.txt  ) I attach a subset of the data I am working with that comprises
 the variables specified in the above glm model and by the way the funky
 variable called hcp.
 Thank you for take your time to help me in this.
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/binomial-GLM-quasi-separation-tp3901687p3904261.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about GAMs

2011-10-13 Thread Gavin Simpson
On Thu, 2011-10-13 at 08:20 -0700, pigpigmeow wrote:
 I'm confused...
 I type ..
 predict.gam(ozonea,type=s(ratio,bs=cr))

That is not a valid 'type'; normally you'd use `type = terms` or `type
= iterms`, depending on whether you want (any) standard errors to
include the uncertainty about the intercept. 

   pressure  maxtemp   s(avetemp) s(ratio)
 1   -0.0459102290 -0.185178463  0.263358446 -0.164558673
 2   -0.0286464652 -0.194731320  0.199315027  0.727823293
 30.0478073459 -0.013227033  0.002228896  0.342373202
 4   -0.0089164494  0.082301539 -0.037331159 -0.067260889
 50.0675373617  0.024984396 -0.047067786 -0.357569389
 60.0823348735  0.101407254 -0.075884852 -1.485036738
 7   -0.0977015204  0.177830112 -0.094755158  0.236575309
 8   -0.0903027645  0.225594398 -0.113346667  0.435141242
 90.0206785742  0.187382969 -0.066346157 -0.256133513
 10  -0.1371615520  0.101407254 -0.131656887  0.145057584
 11  -0.0477674066 -0.181001505  0.260279546  0.180513043
 12  -0.0921599421 -0.009050075 -0.020511366  0.281470433
 13   0.0681464361 -0.219212934  0.335348247  0.270813178
 ..
 
 I want to show s(ratio,bs=cr) term, and show the warning message
 Warning message:
 In predict.gam(ozonea, type = s(ratio, bs = cr)) :
   Unknown type, reset to terms.
 
 I don't understand what does it mean.

So if the above comments I made are not sufficient to explain this,
read ?predict.gam and see what the allowed options for the 'type'
argument are.

G

 By the way, i use log-link function, 
 should I convert log-link function of newozone to fitted value of newozone?
 1. log(newozone) - s(ratio,bs=cr) = x
 then   exp(x)
 
 2. exp(newozone) - s(ratio,bs=cr) =X
 then x
 
 which one is correct?
 so confused
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Question-about-GAMs-tp3900848p3901842.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vegan: Anova.CCA accessing original data using option by=margin

2011-10-11 Thread Gavin Simpson
On Mon, 2011-10-10 at 23:51 -0700, Steve Pawson wrote:
 Hello,
 
 I am attempting to use the ANOVA.CCA function with the by=margin option.
 The process works fine using the by=terms option and I note in the Vegan
 manual that Jari suggests that an error may occur if the anova does not have
 access to the data on the original constraints.
 
 This is the error that I get:
 
 Error in dimnames(x) - dn : 
   length of 'dimnames' [2] not equal to array extent
 
 My question is, does anyone know if this error relates to what Jari is
 referring to (or is it a different problem), and if it is, how do I link the
 anova to the original constraints?

It is almost impossible to answer that without a lot more information.
For starters, what does traceback() say when run immediately *after* you
get the error?

G

 Many thanks for any help provided.
 
 Regards
 
 Steve
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Vegan-Anova-CCA-accessing-original-data-using-option-by-margin-tp3893005p3893005.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About stepwise regression problem

2011-10-07 Thread Gavin Simpson
On Thu, 2011-10-06 at 20:12 -0700, pigpigmeow wrote:
 chris,
 I'm not using lmer, i just use gam mixed with smoothing function and linear
 function
snip /
 I have the following question,
 1.if I reject the variable term which has greater the p-value no matter the
 variable term is smoothing term or linear term, is it correct to perform
 stepwise regression.
  2. In my model
 noxd-gam(newNOX~pressure+maxtemp+s(avetemp,bs=cr)+s(mintemp,bs=cr)+s(RH,bs=cr)+s(solar,bs=cr)+s(windspeed,bs=cr)+s(transport,bs=cr),family=gaussian
 (link=log),groupD,methods=REML) , is it generalized additive mixed model?

That is not a GAMM. It is a GAM.

 3. what the different if I use other criteria such as AIC or BIC? 

Don't do what you are doing. Heed Frank's advice that stepwise selection
without shrinkage is invalid.

For feature selection in GAMs, use the select = TRUE argument to gam()
to turn on an addition penalty that can shrink terms out of the model,
hence doing feature selection for you.

HTH

G

 Anyway, thank all of you!
   
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/About-stepwise-regression-problem-tp3870217p3880835.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adonis and nmds help and questions for a novice.

2011-10-04 Thread Gavin Simpson
On Tue, 2011-10-04 at 08:45 +, Ashley Houlden wrote:
 Hi,
 
 forgive me if someone has already posted about this but I have had a
 look and cannot find the answer, also I am very new to R and been
 getting the grips with this.
 
 I have been trying to use Adonis to find out if there are significant
 difference between groups on data that I have analyses with NMDS, and
 have been struggling with getting this to work and understanding what
 is going on.  I am looking at diversity in different soils with either
 woodland or grassland habitats.
 
 I have run the scripts
 
 library(vegan)
 library(ecodist)
 library(MASS)
 mydata - read.table(ash_data.csv, header=TRUE, sep=,,
 row.names=Site)
 
 envdata_fit - read.table(ash_env.csv, header=TRUE, sep=,,
 row.names=Site)
 
 #distance matrix of samples using bray curtis
 d= bcdist(mydata, rmzero=FALSE)
 
 And then using the distance matrix from this to use for adoins? Is
 this correct.
 
 With this I have then run Adonis
 
 results = adonis(d ~ wood, envdata_fit, permutations = 1000)
 
 and get significant values to see if sig diff in diversity between
 wood and grass habitat.
 
 However I have been reading about combining the variables, but there
 seems to be different ways for example
 
 results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000)
 
 so get sig values for Wood and soil
 
 or
 
 results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000)
 
 And I get sig values for wood, soil, and wood soil interaction.
 
 This seems to make sense, however for both if I put the variable the
 other way around (soil+wood or soil*wood) I get very different sig
 values, even accounting for the fact they vary slightly due to the
 permutations. So whats is going on and why to the the values change so
 much?

You can isolate the effects due to different permutations being used by
setting a seed via set.seed().

As ?adonis says, sequential sums of squares are used. If there is
imbalance in your design it isn't surprising that the results are not
invariant to the ordering of terms in the formula.

 I was also wondering in Adonis, can you nest treatments, so see effect
 of soil removing the effect of woodland as you can with anova?

Not 100% sure what you mean by nested, but adonis() uses the full
functionality of R's formula interface. See The R manual for details
or ?formula. ?adonis also has details of how you might test a nested
design in the Details section - this might not be what you want but it
does allow you to test for an effect of one variable by conditioning the
permutations on another.

 Another general questions as well, if I have more than two groups in a
 treatment, say for soil, clay, sand, loam and do the stats, and I get
 a significant value, what does it actually mean, is it that soil
 generally has an effect, with each group separate, or there are
 general differences between soils which may be one group is very
 different to the other two?

The permutation test, test at the level of the factor, not pairwise
comparisons of the levels within the factor. So you get information on
Soil, not on Clay, Sand, Loam levels. This is the same as you would get
if you did anova(mod) where mod was a linear model with a factor
predictor.

betadisper() the sister function to adonis() which tests for differences
of multivariate dispersions, not differences of multivariate means, does
allow the sorts of pairwise tests you are thinking of, but we haven't
implemented this in adonis yet I'm afraid.

HTH

G

 Many many thanks to anyone who can help me as I have asked people who
 use R near me and no-one is sure and uses Adonis..
 
 Ash

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adonis and nmds help and questions for a novice.

2011-10-04 Thread Gavin Simpson
On Tue, 2011-10-04 at 08:45 +, Ashley Houlden wrote:
 Hi,
snip /
 #distance matrix of samples using bray curtis
 d= bcdist(mydata, rmzero=FALSE)

In addition, you don't necessarily need ecodist for the bray curtis
distance. vegdist() in vegan will compute this for you.

Not that there is anything wrong with ecodist I hasten to add - just
that you can do this all in vegan if you wanted.

G

 And then using the distance matrix from this to use for adoins? Is this 
 correct.
 
 With this I have then run Adonis
 
 results = adonis(d ~ wood, envdata_fit, permutations = 1000)
 
 and get significant values to see if sig diff in diversity between wood and 
 grass habitat.
 
 However I have been reading about combining the variables, but there seems to 
 be different ways for example
 
 results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000)
 
 so get sig values for Wood and soil
 
 or
 
 results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000)
 
 And I get sig values for wood, soil, and wood soil interaction.
 
 This seems to make sense, however for both if I put the variable the other 
 way around (soil+wood or soil*wood) I get very different sig values, even 
 accounting for the fact they vary slightly due to the permutations. So whats 
 is going on and why to the the values change so much?
 
 I was also wondering in Adonis, can you nest treatments, so see effect of 
 soil removing the effect of woodland as you can with anova?
 
 Another general questions as well, if I have more than two groups in a 
 treatment, say for soil, clay, sand, loam and do the stats, and I get a 
 significant value, what does it actually mean, is it that soil generally has 
 an effect, with each group separate, or there are general differences between 
 soils which may be one group is very different to the other two?
 
 Many many thanks to anyone who can help me as I have asked people who use R 
 near me and no-one is sure and uses Adonis..
 
 Ash
 
 
 No virus found in this message.
 Checked by AVG - www.avg.comhttp://www.avg.com
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interaction plot type=o

2011-10-03 Thread Gavin Simpson
On Fri, 2011-09-30 at 12:33 -0400, David Winsemius wrote:
 On Sep 30, 2011, at 2:16 AM, Petr PIKAL wrote:
 
 
  David,
  thank you for your reply
 
  I tried this
  attach(mtcars)
  interaction.plot(cyl, gear, mpg, type=o, pch=5:8, lty=1 )
 
  but I got this error:
  Error in match.arg(type) : 'arg' should be one of l, p, b
 
  and in ?interaction.plot, o it is not listed in type arguments.
  Is there any other way to force it to take the argument?
 
  As David suggested you need to change code for interaction.plot. If  
  you
  write
 
  interaction.plot
 
  you will get the code
 
  interaction.plot
  function (x.factor, trace.factor, response, fun = mean, type = c(l,
 p, b), legend = TRUE, trace.label =
  deparse(substitute(trace.factor)),
 fixed = FALSE, xlab = deparse(substitute(x.factor)), ylab = ylabel,
 ylim = range(cells, na.rm = TRUE), lty = nc:1, col = 1, pch =  
  c(1L:9,
 0, letters), xpd = NULL, leg.bg = par(bg), leg.bty = n,
 xtick = FALSE, xaxt = par(xaxt), axes = TRUE, ...)
  {...
 
  Copy it into some suitable text editor (not Word please) and change it
  according to your wish.

Or use `fix()` or `fixInNamespace()` if this is just a one off.

G

 You can do it at the console on both the OS versions of R I have used.  
 just copy the code and paste. Add the - and the ,o and hit enter.  
 Piece of cake.
 
  I would start with adding o to type in function
  definition and see how it behaves.
 
 I tested it. Worked as expected.
 
  Then you can copy the whole code to
  your modified function e.g.
 
  my.int.plot - function(x,
 
  and call
  my.int.plot(cyl, gear, mpg, type=o, pch=5:8, lty=1 )
 
  Regards
  Petr
 
 
 
  Thanks
  Claudio
 
  On Thu, Sep 29, 2011 at 9:00 PM, David Winsemius
  dwinsem...@comcast.netwrote:
 
 
  On Sep 29, 2011, at 7:22 PM, Heverkuhn Heverkuhn wrote:
 
  Hello,
  I was wondering if there is any equivalent of interaction.plot  
  that allow
  you to set type=o
  I tried to use interaction.plot and I have a gap between the  
  symbols of
  the points and the line.
 
 
  If it's OK to have the lines going right though the symbols, then  
  go ahead,
  hack the code. All you need to do is add ,o to the type  
  arguments in the
  argument list. The code's not hidden or anything that gets in your  
  way.
 
 
  David Winsemius, MD
  West Hartford, CT
 
 
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAMs in R : How to put the new data into the model?

2011-09-29 Thread Gavin Simpson
On Wed, 2011-09-28 at 02:10 -0700, pigpigmeow wrote:
 For example: 
 GAMs and after stepwise regression:

You probably don't want to be doing stepwise model/feature selection in
any regression model. Marra  Wood (2011, Computational Statistics and
Data Analysis 55; 2372-2387) show results that suggest adding an extra
penalty term during model fitting, that allows terms to be penalised out
of the model, performs well for feature selection in a series of GAM
exercises.

You can turn this on in mgcv::gam() via the `select` argument by setting
it to `TRUE`.

 cod-gam(newCO~RH+s(solar,bs=cr)+windspeed+s(transport,bs=cr),family=gaussian
 (link=log),groupD,methods=REML)
 
 I used 10 year meterorology data (2000-2010) to form equation of
 concentration of carbon monoxide.
 NOW, I have 2011 meteorology data, I want to use the above GAMs to get the
 predict value of concentration of carbon monoxide.

So put your 2011 data into a data frame with (at the minimum) the column
names:

`RH`, `solar`, `windspeed`, `transport`

If that data frame were called, say, `dat2011` then you provide the
`predict()` method for gam objects with both the fitted model **and**
the new data (`dat2011`), e.g.:

pred - predict(cod, newdata = dat2011)

See ?predict.gam for more details. You are probably going to need type =
response so you get values back on the scale of the response not the
the link function.

HTH

G

 How can I put the 2011 data into this gam and get the expected value of
 concentration of Carbon monoxide??
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/GAMs-in-R-How-to-put-the-new-data-into-the-model-tp3849851p3850473.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] metaMDS

2011-09-28 Thread Gavin Simpson
On Fri, 2011-09-23 at 12:43 -0500, Jean V Adams wrote:
 Lineth Contreras wrote on 09/23/2011 11:35:10 AM:
  
  Hello R-user community,
  
  I am applying the function metaMDS. However, I would like to know if 
 there
  is any option to export the data I got from the axis as a data frame.
  
  I have tried as.data.frame.list but is not working.  Any suggestion?
  
  Thank you in advance for your help,
  
  Lineth
 
 
 When you say the data I got from the axis do you mean the coordinates 
 contained in the $points of the resulting object?  If so, something like 
 this should work (using the example provide in ?metaMDS):

You would be better off with the scores() method for metaMDS objects:

data(dune)
sol - metaMDS(dune)
scrs - scores(sol)

`scrs` is a matrix:

 class(scrs)
[1] matrix

which can be exported via say `write.csv()`:

write.csv(scrs, filenames.csv)

If you want a data frame in R, then

SCRS - as.data.frame(scrs)

will work, but there is little reason to convert to a data frame just to
export it out of R.

HTH

G

 data(dune)
 library(MASS)
 sol - metaMDS(dune)
 df - as.data.frame(sol$points)
 
 Jean
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with predict.gam in package gam

2011-09-15 Thread Gavin Simpson
On Wed, 2011-09-14 at 18:06 -0500, Drew Tyre wrote:
 I am trying to reproduce plots in Chapter 3 of Zuur et al Mixed
 Effects models and extensions in Ecology. For pedagogical reasons,
 they make a series of plots with gam(...) in package gam. I encounter
 errors that trace back to the predict.gam method. Below is a
 repeatable example using one of the example datasets:

The code works for me with:

 sessionInfo()
R version 2.13.1 Patched (2011-07-08 r56332)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] splines   grid  stats graphics  grDevices utils
datasets 
[8] methods   base 

other attached packages:
[1] gam_1.04 analogue_0.7-4   princurve_1.1-10 MASS_7.3-13 
[5] lattice_0.19-23  vegan_1.17-12   

loaded via a namespace (and not attached):
[1] tools_2.13.1

G

 library(gam)
 data(kyphosis)
 test = gam(Kyphosis ~ s(Age,4) + Number, family = binomial, data=kyphosis)
 predict(test) # works
 predict(test,se.fit=TRUE) # fails
 #Error in dim(data) - dim : attempt to set an attribute on NULL
 
 # after debugging, it wants a model frame ...
 test = gam(Kyphosis ~ s(Age,4) + Number, family = binomial,
 data=kyphosis, model=TRUE)
 predict(test,se.fit=TRUE) # fails, but with a different error
 # Error in terms.default(object, data = data) :
 #   no terms component nor attribute
 
 I am using R 2.13.1 on a PC inside RStudio.
 
 I realize I can do this in package mgcv, but just curious as to
 whether this is something known and fixable?
 
 Thanks
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mvpart analyses with covariables

2011-09-13 Thread Gavin Simpson
On Tue, 2011-09-13 at 13:46 +0800, 'Ben Ford' wrote:
 Hi all,
 
 I am fairly new to R and I am trying to run mvpart and create a MRT using
 explanatory variables and covariables. I've been following the procedures in
 Numerical Ecoogy with R.
 
 The command (no covariables) which works fine -
 
 ABUNDTMRT - mvpart(abundance ~
 .,factors,margin=0.08,cp=0,xv=1se,xval=nrow(abundance),xvmult=100,which=4)
 
 where abundance is 4th root transformed fish abundance (103 species x 168
 samples), and factors is the relief (high, medium, low profile, sand
 inundated reef, flat), benthos (coral, sessile inverts, kelp, macroalgae,
 seagrass, sand), depth (continuous in meters), latitude, and longitude of
 each sample.
 
 To try and incorporate spatial autocorrelation (as a covariate) into this I
 have been trying the command -
 
 ABUNDTMRT - mvpart(abundance ~ environ + spatial,
 data.frame,margin=0.08,cp=0,xv=1se,xval=nrow(abundance),xvmult=100,which=4)

I really don't think you can do that, and even if you could it is not a
good thing to do.

Arrange your variables (explanatory or covariables) into a single data
frame, then, if this data frame contains only variable that go into the
model (explanatory or covariables), you can use the `.` notation:

ABUNDTMRT - mvpart(abundance ~ ., data = my.data.frame, ...)

However, if you are using covariable in the sense of Canoco, then I
think you are out of luck. There isn't a way to partial out the spatial
effects and fit a model using the non-spatial components of the
environmental/explanatory variables.

I wonder where you got the impression that this could be done?

Of course, you can do this (sort of) if you are prepared to do a bit of
work; i) ordinate the abundance data with spatial covariables as
constraints and take the residuals (all site scores in unconstrained
axes), ii) do the the same thing with the environment so that you
residualise the explanatory variables, iii) take the residualised
species data and the residualised explanatory matrix and use them as
inputs to mvpart(). There may be infelicities here if you need to use
different ordination techniques to residualise the species and the
explanatory variables matrices...

HTH

G

 where abundance is as above, environ is the environmental factors (from
 above) and spatial is the eigenfunctions from a PCNM analysis. data.frame is
 the environ and spatial factors as a data.frame.
 
 This gives the error -
 
 Error in `[[-.data.frame`(`*tmp*`, preds, value = c(72L, 72L, 80L, 72L,
  :
   replacement has 504 rows, data has 168
 
 As I am new to this, I am not sure if I am entering an incorrect formula
 when trying to include the covariables, or if this is just something which
 mvpart cannot do.
 
 Thanks.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] On-line machine learning packages?

2011-09-13 Thread Gavin Simpson
 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
   __
   r-h...@r-project.org mailing 
   listhttps://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] envfit vector labels with ordiplot3d

2011-09-13 Thread Gavin Simpson
On Tue, 2011-09-13 at 05:18 -0700, Briony wrote:
 It *really* does help to read the documentation for functions. Like most
 of the functions in vegan that extract information from ordinations,
 ordilabel() extracts information for axes 1 and 2 only, by default.

It also helps if I read the docs too!

The reason this was in 2d and will always be in 2d is that that is how
scatterplot3d works - the arrow head in 2d on the device has been placed
at the 2d translation of the original 3d coordinates.

It might help to push the label away from the arrow head a little bit:

ordilabel(pl$arrows * 1.075)

will push the label 7.5% away from the arrow head.

It seems that that is the best we can currently do with vegan.

When you say that this is not what you wanted, what did you want to
achieve?

G

 Does
 
 ordilabel(pl$arrows, choices = 1:3)
 
 work for you?
 
 G
 
 Thanks for the suggestion. Unfortunately that didn't work - it returned an
 error (Error in scores.default(x, choices = choices, display = display, ...)
 : subscript out of bounds). The ordilabel help says ordilabel(x, display,
 labels, choices = c(1, 2)... which seems to suggest the function might not
 allow a third axis at the moment?
 Cheers,
 Briony
 
 
 On Tue, Sep 13, 2011 at 2:53 AM, Gavin Simpson [via R] 
 ml-node+s789695n3807984...@n4.nabble.com wrote:
 
  On Mon, 2011-09-12 at 03:24 -0700, Briony wrote:
   Thank you very much for the suggestion. And while I'm here, thank you for
 
   vegan and the documentation that goes with it.
  
   Function ordiplot3d uses scatterplot3d, and it returns also all
   scatterplot3d items, like functions xyz.converter and points3d that
   can be used for tuning labels.
  
   I tried ordilabel(pl$arrows) but the labels only seem to be in two
   dimensions.
 
  It *really* does help to read the documentation for functions. Like most
  of the functions in vegan that extract information from ordinations,
  ordilabel() extracts information for axes 1 and 2 only, by default.
 
  Does
 
  ordilabel(pl$arrows, choices = 1:3)
 
  work for you?
 
  G
 
   With ordixyplot I can see no other choice than that you edit the
   function and preferably contribute your edited function to vegan
   (and will be credited with the function help).
  
   I'm not a skilled enough user of R to edit the ordixyplot function - so
  I'll
   pass on that invitation to anyone else who reads this thread?
  
   Thanks again,
   Briony
  
   Briony brionynorton at gmail.com writes:
  
   
Hi R experts,
   
I'm looking for some help with plotting vectors from envfit in vegan,
  onto
a
3d plot using ordiplot3d. So far I have
   
data.mds - metaMDS(data, k=3,trace = FALSE)
vect_data-envfit(data.mds,vegdata[,3:21],choices=1:3,permu=)
ordiplot3d(data.mds,envfit=vect_data)
ordixyplot(data.mds,pch=pts,envfit=vect_data)
   
(my data's not really called data, I thought it might be easier to
communicate this way)
   
These display the vectors as arrows, but what I would really like is
  for
the
arrows to be labelled, like what comes up automatically in ordirgl or
  with
a
2D ordiplot.
   
I've gone through the help and tried everything I can work out, but I
  must
be missing something important, because nothing's worked so far. I
  would
be
happy to use ordixyplot and show a series of 2D plots, but I can't get
labels on those arrows either.
   
Any pointers in the right direction would be gratefully received.
Briony
  
   Briony,
  
   There really is no way to do this automatically, but if someone fixes the
 
   functions, we are happy to incorporate those changes in vegan.
  
   You may be able to achieve something like that with ordiplot3d, but I am
   not sure it looks completely satisfactory. Function ordiplot3d returns
   invisibly the plotting object which contains, among other items. the
   coordinates of arrow heads in the  flattened graph. So this could work:
  
   pl - ordiplot3d(data.mds,envfit=vect_data)
   ordilabel(pl$arrows)
  
   Function ordiplot3d uses scatterplot3d, and it returns also all
   scatterplot3d items, like functions xyz.converter and points3d that
   can be used for tuning labels.
  
   With ordixyplot I can see no other choice than that you edit the
   function and preferably contribute your edited function to vegan
   (and will be credited with the function help).
  
   Cheers, Jari Oksanen
  
   __
   [hidden email] mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
   --
   View this message in context:
  http://r.789695.n4.nabble.com/envfit-vector-labels-with-ordiplot3d-tp3800669p3807015.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   [hidden email

Re: [R] envfit vector labels with ordiplot3d

2011-09-12 Thread Gavin Simpson
On Mon, 2011-09-12 at 03:24 -0700, Briony wrote:
 Thank you very much for the suggestion. And while I'm here, thank you for
 vegan and the documentation that goes with it.
 
 Function ordiplot3d uses scatterplot3d, and it returns also all
 scatterplot3d items, like functions xyz.converter and points3d that
 can be used for tuning labels.
 
 I tried ordilabel(pl$arrows) but the labels only seem to be in two
 dimensions.

It *really* does help to read the documentation for functions. Like most
of the functions in vegan that extract information from ordinations,
ordilabel() extracts information for axes 1 and 2 only, by default.

Does

ordilabel(pl$arrows, choices = 1:3)

work for you?

G

 With ordixyplot I can see no other choice than that you edit the
 function and preferably contribute your edited function to vegan
 (and will be credited with the function help).
 
 I'm not a skilled enough user of R to edit the ordixyplot function - so I'll
 pass on that invitation to anyone else who reads this thread?
 
 Thanks again,
 Briony 
 
 Briony brionynorton at gmail.com writes:
 
 
  Hi R experts,
 
  I'm looking for some help with plotting vectors from envfit in vegan, onto
  a
  3d plot using ordiplot3d. So far I have
 
  data.mds - metaMDS(data, k=3,trace = FALSE)
  vect_data-envfit(data.mds,vegdata[,3:21],choices=1:3,permu=)
  ordiplot3d(data.mds,envfit=vect_data)
  ordixyplot(data.mds,pch=pts,envfit=vect_data)
 
  (my data's not really called data, I thought it might be easier to
  communicate this way)
 
  These display the vectors as arrows, but what I would really like is for
  the
  arrows to be labelled, like what comes up automatically in ordirgl or with
  a
  2D ordiplot.
 
  I've gone through the help and tried everything I can work out, but I must
  be missing something important, because nothing's worked so far. I would
  be
  happy to use ordixyplot and show a series of 2D plots, but I can't get
  labels on those arrows either.
 
  Any pointers in the right direction would be gratefully received.
  Briony
 
 Briony,
 
 There really is no way to do this automatically, but if someone fixes the
 functions, we are happy to incorporate those changes in vegan.
 
 You may be able to achieve something like that with ordiplot3d, but I am
 not sure it looks completely satisfactory. Function ordiplot3d returns
 invisibly the plotting object which contains, among other items. the
 coordinates of arrow heads in the  flattened graph. So this could work:
 
 pl - ordiplot3d(data.mds,envfit=vect_data)
 ordilabel(pl$arrows)
 
 Function ordiplot3d uses scatterplot3d, and it returns also all
 scatterplot3d items, like functions xyz.converter and points3d that
 can be used for tuning labels.
 
 With ordixyplot I can see no other choice than that you edit the
 function and preferably contribute your edited function to vegan
 (and will be credited with the function help).
 
 Cheers, Jari Oksanen
 
 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code. 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/envfit-vector-labels-with-ordiplot3d-tp3800669p3807015.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] packfor

2011-08-17 Thread Gavin Simpson
On Wed, 2011-08-17 at 09:49 +0200, COCCIA , CRISTINA wrote:
 
 Good morning all,  
 
 I'm trying to find the package packfor to install on my library in
 R, but I'm not available to find it online, so I would ask to you if
 you please let e know how I could find it and if I need a special R
 version to do it. 

packfor is part of the *sedar* project on R-forge. You can get a binary
or sources from:

http://r-forge.r-project.org/R/?group_id=195

HTH

G


-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Symbol Font Baseline, Cairo, Card Symbols

2011-08-17 Thread Gavin Simpson
On Tue, 2011-08-16 at 17:46 -0700, ivo welch wrote:
 I think I found a bug in the Cairo library, plus weird behavior in
 both the Cairo and the normal pdf device.  The baseline of the spades
 symbol seems to be off.  This is easier to show than it is to explain.
  The problem does not appear in the normal pdf device, which is why I
 am guessing this is a Cairo bug.  moreover, I cannot figure out why
 three of the card symbols seem to be transparent, but the fourth is
 not.  this is the case for both the Cairo and the ordinary pdf
 devices.
 
 I thought I would report it to the R wizards on this group...

Why? One is expressly asked to address such problems with the package
maintainer(s), especially if you feel there is a bug in the package.

And whilst Martin M is potentially indisposed at UseR! 2011, I will
mention that it is a *package* not a library when you refer to the R
package Cario. If you had identified a bug in the Cairo *library* you
should be reporting it upstream to the Cairo people:
http://www.cairographics.org/

G

 library(Cairo)
 
 clubs - expression(symbol('\247'))
 hearts - expression(symbol('\250'))
 diamonds - expression(symbol('\251'))
 spades - expression(symbol('\252'))
 csymbols - c(clubs, diamonds, hearts, spades)
 
 CairoPDF(file = cardsymbols.pdf)
 
 plot( 0, xlim=c(0,5), ylim=c(0,2), type=n )
 clr - c(black, red, red, black)
 for (i in 1:4) {
   hline - function( yloc, ... ) for (i in 1:length(yloc)) lines(
 c(-1,6), c(yloc[i],yloc[i]), col=gray)
   hline(0.9);  hline(1.0);   hline(1.1);   hline(1.2)
   text( i, 1, csymbols[i], col=clr[i], cex=5 )
   text( i, 0.5, csymbols[i], col=clr[i] )
 }
 
 dev.off()
 
 
 
 
 Ivo Welch (ivo.we...@gmail.com)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Symbol Font Baseline, Cairo, Card Symbols

2011-08-17 Thread Gavin Simpson
On Wed, 2011-08-17 at 08:06 -0700, ivo welch wrote:
 hi gavin---I am not even sure that it is a cairo bug, much less do I
 know about the details where it sits.  for all I know, it could be an
 Apple problem. the possible bug report was not only for the cairo
 package (what's the difference between a package and a library?  in my
 user R code, I invoke it as library(cairo)), but also for others who
 may use it and be surprised when they run into the same issue,
 wondering if it is their code, or a more general issue..

In the real world, libraries contain books. In the R world, libraries
contain packages. In the same way that in the real world it would be
silly and wrong to call a book a library, it is wrong to call an R
package a library.

The thing that resides on CRAN is the R *package* named Cairo. It
works by making use of a general purpose *software* *library* called
Cairo, details of which can be found here
http://www.cairographics.org/ .

I appreciate that the name of the `library()` function used to load
packages for use is somewhat counter-intuitive, but there you go. There
is talk of moving to a `use()` function to replace `library()`; when/if
that happens, perhaps it will lessen the confusion.

Even if you are not sure, the general advice is to email the
maintainers. If there is nothing wrong and this is a quirk of
fonts/system/OS etc then your archived email lives on via Google and may
confuse others in the future. Giving package authors a chance to look at
the problem is also common courtesy. The Posting Guide is clear in this
regard:

If the question relates to a contributed package , e.g., one downloaded
from CRAN, try contacting the package maintainer first.

Many people ignore this however.

G

 /iaw
 
 
 Ivo Welch (ivo.we...@gmail.com)
 
 
 
 
 
 On Wed, Aug 17, 2011 at 5:18 AM, Gavin Simpson gavin.simp...@ucl.ac.uk 
 wrote:
  On Tue, 2011-08-16 at 17:46 -0700, ivo welch wrote:
  I think I found a bug in the Cairo library, plus weird behavior in
  both the Cairo and the normal pdf device.  The baseline of the spades
  symbol seems to be off.  This is easier to show than it is to explain.
   The problem does not appear in the normal pdf device, which is why I
  am guessing this is a Cairo bug.  moreover, I cannot figure out why
  three of the card symbols seem to be transparent, but the fourth is
  not.  this is the case for both the Cairo and the ordinary pdf
  devices.
 
  I thought I would report it to the R wizards on this group...
 
  Why? One is expressly asked to address such problems with the package
  maintainer(s), especially if you feel there is a bug in the package.
 
  And whilst Martin M is potentially indisposed at UseR! 2011, I will
  mention that it is a *package* not a library when you refer to the R
  package Cario. If you had identified a bug in the Cairo *library* you
  should be reporting it upstream to the Cairo people:
  http://www.cairographics.org/
 
  G
 
  library(Cairo)
 
  clubs - expression(symbol('\247'))
  hearts - expression(symbol('\250'))
  diamonds - expression(symbol('\251'))
  spades - expression(symbol('\252'))
  csymbols - c(clubs, diamonds, hearts, spades)
 
  CairoPDF(file = cardsymbols.pdf)
 
  plot( 0, xlim=c(0,5), ylim=c(0,2), type=n )
  clr - c(black, red, red, black)
  for (i in 1:4) {
hline - function( yloc, ... ) for (i in 1:length(yloc)) lines(
  c(-1,6), c(yloc[i],yloc[i]), col=gray)
hline(0.9);  hline(1.0);   hline(1.1);   hline(1.2)
text( i, 1, csymbols[i], col=clr[i], cex=5 )
text( i, 0.5, csymbols[i], col=clr[i] )
  }
 
  dev.off()
 
 
 
  
  Ivo Welch (ivo.we...@gmail.com)
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

Re: [R] reflecting a PCA biplot

2011-08-10 Thread Gavin Simpson
On Tue, 2011-08-09 at 22:57 +1000, Andrew Halford wrote:
 Hi Listers,
 
 I am trying to reflect a PCA biplot in the x-axis (i.e. PC1) but am not
 having much success. In theory I believe all I need to do is multiply the
 site and species scores for the PC1 by -1, which would effectively flip the
 biplot.
 
 I am creating a blank plot using the plot command and accessing the results
 from a call to rda. I then use the calls to scores to obtain separate site
 and species coordinates and I have worked out how to multiply the
 appropriate PC1 scores by -1 to create the site and species scores I want.
 However I am not sure how to change the call to plot which accesses the
 results of the call to rda to draw the blank plot. The coordinates it is
 accessing are for the unreflected ordination and this does not match the new
 site and species scores that I have.
 
 
  fish.pca - rda(fish.hel)
  fish.site - scores(fish.pca,display=sites,scaling=3)
  fish.spp - scores(fish.pca,display=species,scaling=3)
 
  fish.site[,PC1] - -1*(fish.site[,PC1])
  fish.spp[,PC1] - -1*(fish.spp[,PC1])
 
  graph - plot(fish.pca,display=c(sites,species),type=n,scaling=3) #
 how do I get the plot to draw up the blank display based on the reversed
 site and species scores?

Do you mean something like...?

require(vegan)
data(dune)
mod - rda(dune)
si.scrs - scores(mod, display = sites, scaling = 3, choices = 1:2)
sp.scrs - scores(mod, display = species, scaling = 3, choices = 1:2)

si.scrs[, PC1] - -1 * si.scrs[, PC1]
sp.scrs[, PC1] - -1 * sp.scrs[, PC1]

xlim - range(0, si.scrs[, 1], sp.scrs[, 1])
ylim - range(0, si.scrs[, 2], sp.scrs[, 2])

plot(si.scrs[,1], si.scrs[,2], ylim = ylim, xlim = xlim,
 ylab = PC2, xlab = PC1, cex = 0.7, asp = 1)
abline(h = 0, lty = dotted)
abline(v = 0, lty = dotted)
points(sp.scrs[,1], sp.scrs[,2], col = red, pch = 3, cex = 0.7)
box()

For non-standard plotting of ordination objects, our advice has always
been to build the plot up from lower-level plotting functions rather
than the specific methods supplied with vegan.

A comparison: (not quite the same, I know, but close enough)

layout(matrix(1:2, ncol = 2))
plot(mod, display=c(sites,species), type = p, scaling=3,
 main = Original)
plot(si.scrs[,1], si.scrs[,2], ylim = ylim, xlim = xlim,
 ylab = PC2, xlab = PC1, cex = 0.7, asp = 1,
 main = Flipped PC1)
abline(h = 0, lty = dotted)
abline(v = 0, lty = dotted)
points(sp.scrs[,1], sp.scrs[,2], col = red, pch = 3, cex = 0.7)
box()
layout(1)

If you want type = t, the default for plot.cca, then use the same
coordinates but extract the relevant labels from the two scores objects:

plot(si.scrs[,1], si.scrs[,2], ylim = ylim, xlim = xlim,
 ylab = PC2, xlab = PC1, cex = 0.7, asp = 1,
 type = n) ## no plotting this time first
abline(h = 0, lty = dotted)
abline(v = 0, lty = dotted)
text(si.scrs[,1], si.scrs[,2], labels = rownames(si.scrs),
 cex = 0.7) ## add site and species scores as labels
text(sp.scrs[,1], sp.scrs[,2], labels = rownames(sp.scrs),
 col = red, cex = 0.7)
box()

HTH

G

 Any help appreciated.
 
 cheers
 
 Andy

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running R in a sandbox

2011-08-03 Thread Gavin Simpson
On Wed, 2011-08-03 at 11:04 +0300, Antonio Rodriges wrote:
 Hello,
 
 The idea is to grant access of remote users to R running on Linux.
 Users must have ability to run their
 R scripts but avoid corrupting the operating system.
 
 How one can restrict/limit access of remote users to certain R
 functions? For example, dealing with IO (file system), graphical
 tools, etc.

We've been here before, IIRC. But I'm too lazy to check the archives -
that's your job ;-)

Try a search on http://finzi.psych.upenn.edu/search.html for relevant
terms and make sure you turn on the email lists and off the
functions/vignettes.

 Thank you.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2011-08-02 Thread Gavin Simpson
On Thu, 2011-07-28 at 11:58 -0400, Sarah Goslee wrote:
 Hi Mark,
 
 On Thu, Jul 28, 2011 at 10:44 AM,  m...@statcourse.com wrote:
 
1.  How can I plot the entire tree produced by rpart?
 
 What does plot() not do that you are expecting?

Not do any labelling... ;-)

text(tree)

where `tree` is your fitted tree will add the labels after using
`plot()` as per Sarah's reply.

 
2.  How can I submit a vector of values to a tree produced by rpart and 
  have
it make an assignment?
 
 What does predict() not do that you are expecting?

Indeed.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Elegant way to subtract matrix from array

2011-07-27 Thread Gavin Simpson
On Wed, 2011-07-27 at 01:06 -0700, steven mosher wrote:
 there are really two related problems here
 
 I have a 2D matrix
 
 
 A - matrix(1:100,nrow=20,ncol =5)
 
 
 S - matrix(1:10,nrow=2,ncol =5)
 
 
 #I want to subtract S from A. so that S would be subtracted from the
 first 2 rows of
 
 #A, then the next two rows and so on.

For this one, I have used the following trick to replication a matrix

do.call(rbind, rep(list(mat), N)

where we convert the matrix, `mat`, to a list and repeat that list `N`
times, and arrange for the resulting list to be rbind-ed. For your
example matrices, the following does what you want: 
 
A - do.call(rbind, rep(list(S), nrow(A)/nrow(S)))

Whether this is useful will depend on the dimension of A and S - from
your posts on R-Bloggers, I can well imagine you are dealing with large
matrices.

 #I have a the same problem with a 3D array
 
 # where I want to subtract Q for every layer (1-10) in Z
 
 # I thought I solved this one with  array(mapply(-,Z,Q),dim=dim(Z))
 
 # but got the wrong answers
 
 
 Z - array(1:100,dim=c(2,5,10))
 
 Q - matrix(1:10,nrow=2,ncol =5)

For this one, consider the often overlooked function `sweep()`:

sweep(Z, c(1,2), Q, -)

does what you wanted. c(1,2) is the `MARGIN` argument over the
dimensions that Q will be swept from.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function lm, get back the coefficient

2011-07-26 Thread Gavin Simpson
On Tue, 2011-07-26 at 16:43 +0100, Barry Rowlingson wrote:
 On Tue, Jul 26, 2011 at 4:21 PM, ascoquel ascoq...@yahoo.fr wrote:
  Hi,
 
  I've done a linear fit on my data and I would like to get back the a (time)
  coefficient ...
 
  mod-lm(res_sql2$Lx0x~0+time)
  result-data.frame()
  result-coef(mod)
  print(result)
  print(result)
  [1] result
   time
  0.02530191
 
  But I would like just the value 0.02530191 ... I tried result$time but it
  doesn't work ...
 
  It is 'just the value'. It happens to be in a named vector with a
 length of 1. You can do anything to it that you want to do with any
 other number. Try result * 2, or sqrt(result).
 
  If it really annoys you, try names(result)=NULL to get rid of the name.
 
 Barry

`unname()` exists for this purpose:

 foo - 1
 names(foo) - bar
 foo
bar 
  1 
 unname(foo)
[1] 1

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function lm, get back the coefficient

2011-07-26 Thread Gavin Simpson
On Tue, 2011-07-26 at 13:42 -0700, Bert Gunter wrote:
 Not quite, Gavin. You have to assign the value of unname back:

Yes; thought that would be a given. The point was to give a simple
example to show that `unname()` removes names/dimnames.

G

  z - structure(2, names=a)
  unname(z)
 [1] 2
  z
 a
 2
  zz - unname(z)
  zz
 [1] 2
  names(z) - NULL
  z
 [1] 2
 
 Cheers,
 Bert
 
 
 On Tue, Jul 26, 2011 at 1:18 PM, Gavin Simpson gavin.simp...@ucl.ac.uk 
 wrote:
  On Tue, 2011-07-26 at 16:43 +0100, Barry Rowlingson wrote:
  On Tue, Jul 26, 2011 at 4:21 PM, ascoquel ascoq...@yahoo.fr wrote:
   Hi,
  
   I've done a linear fit on my data and I would like to get back the a 
   (time)
   coefficient ...
  
   mod-lm(res_sql2$Lx0x~0+time)
   result-data.frame()
   result-coef(mod)
   print(result)
   print(result)
   [1] result
time
   0.02530191
  
   But I would like just the value 0.02530191 ... I tried result$time but it
   doesn't work ...
 
   It is 'just the value'. It happens to be in a named vector with a
  length of 1. You can do anything to it that you want to do with any
  other number. Try result * 2, or sqrt(result).
 
   If it really annoys you, try names(result)=NULL to get rid of the name.
 
  Barry
 
  `unname()` exists for this purpose:
 
  foo - 1
  names(foo) - bar
  foo
  bar
   1
  unname(foo)
  [1] 1
 
  G
  --
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
   Dr. Gavin Simpson [t] +44 (0)20 7679 0522
   ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
   Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
   Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
   UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
  %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple plots in single frame: 2 upper, 1 lower

2011-07-21 Thread Gavin Simpson
On Wed, 2011-07-20 at 23:38 +1200, Rolf Turner wrote:
 On 20/07/11 11:07, DrCJones wrote:
  Hi,
 
  par(mfrow = c(2,2))
 
  will create a 2x2 window that I can use to plot 4 diferent figures in:
  [plot1 plot2]
  [plot3 plot4]
 
  But how can do 3 so that the bottom spans the width of the upper two:
 
  [plot1 plot1]
  [p   l  o  t 3]
 
  Is this possible in R?
 
 In R ***anything*** is possible. :-)
 
 Your requirement is no only possible, but easy!
 
 See ?layout
 
 You may have to expend a bit of effort to understand the syntax, but
 that will be good for your karma. :-) 

It may help the OP to think of the layout as a 2*2 matrix:

1 2
3 4

say, with plot 3 using regions 3 and 4. If we fill the same matrix with
the plot number we want to draw in it, we have

1 2
3 3

From there it is easy to specify the layout by directly building that
matrix in R:

 (m - matrix(c(1:3,3), ncol = 2, byrow = TRUE))
 [,1] [,2]
[1,]12
[2,]33

and pass that to `layout()`

layout(m)
## invisible() just to stop replicate returning something visible
invisible(replicate(3, plot(1:10))) 
layout(1)

HTH

G

  It ***will*** do exactly what you
 want, if you ask it nicely.
 
  cheers,
 
  Rolf Turner
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PERMANOVA+ and adonis in vegan package

2011-07-12 Thread Gavin Simpson
On Sat, 2011-07-09 at 09:35 -0700, VG wrote:
 Hi,
 
 I was wondering if someone can tell me what is the difference between
 strata argument (function adonis in vegan package) and
 using random effects in PERMANOVA+ add-on package to PRIMER6 when doing
 permutational MANOVA-s? Is the way permutations are done the same?
 
 Thank you very much in advance,
 Vesna 

By the looks of the Primer page on this new add-on, there are
substantial differences between adonis() and PERMANOVA+. The current way
that adonis() permutes data is either to shuffle the data totally at
random, or at random *within* the levels of `strata`. This conditions
the permutations on the clusterings of samples, and reflects a null
hypothesis where samples are not fully freely exchangeable
(independent), but exchangeable only within the groups/clusters of
samples.

It appears PERMANOVA+ has facilities for more complex permutations that
this. I have implemented some of these restricted permutations in my
package `permute` - on r.forge within the vegan stable of packages -
that will hopefully be used within vegan for all it's permutations,
perhaps by as soon as the end of the summer break.

As PERMANOVA+ is closed source and I have not seen any works from Marti
Anderson describing the newer features of her software, which are
included in the new PRIMER add-on, I would suggest you enquire with the
PRIMER people as to the similarities/differences between adonis() and
PERMANOVA+ and report back your findings?

HTH

G

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/PERMANOVA-and-adonis-in-vegan-package-tp3656397p3656397.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: lasso regression

2011-07-12 Thread Gavin Simpson
On Tue, 2011-07-12 at 10:12 -0400, Heiman, Thomas J. wrote:
 Hi,
 
 Hopefully I got the formatting down.. I am trying to do a lasso regression 
 using the lars package with the following data (the data files is in .csv 
 format):
 
   V1  V2  V3  V4  
 V5  V6  V7  V8  V9
 1 FastestTime WinPercentage   PlacePercentage ShowPercentage  
 BreakAverageFinishAverage   Time7AverageTime3AverageFinish
 2 116.9   0.14285715  0.14285715  0.2857143   
 4.4285713.2857144   117.557144  117.76667   5.0
 3 116.22857   0.2857143   0.42857143  0.14285715  
 6.1428572.142857116.84286   116.8   2.0
 4 116.41428   0.0 0.14285715  0.2857143   
 5.7142863.7142856   117.24286   117.14  4.0
 5 115.8   0.5714286   0.0 0.2857143   
 2.1428572.5714285   116.21429   116.5   6.0
 
 #load Data
 crs- 
 read.csv(file:///C:/temp/Horse//horseracing.csvfile:///C:\temp\Horse\horseracing.csv,
  na.strings=c(,, NA, , ?), encoding=UTF-8)
 
 ## define x and y
 x= x-crs[,9]#predictor variables
 y= y-crs[1:8,]  #response variable
 
 
 library(lars)
 cv.lars(x, y, K=10, trace=TRUE, plot.it = TRUE,se = TRUE, type=lasso)
 
 and I get:
 
 LASSO sequence
 Error in one %*% x : requires numeric/complex matrix/vector arguments
 
 Any idea on what I am doing wrong?  Thank you!!

Row 1 contains character data, the variable names. Are you missing a
`header = TRUE` (this is the default in `read.csv()`), or do you have
several header lines?

I also think you have the response/predictors back to front there;
otherwise, why would you need to shrink the coefficient and select from
a model with a single predictor?

HTH

G

 Sincerely,
 
 tom
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relative euclidean distance

2011-07-07 Thread Gavin Simpson
On Wed, 2011-07-06 at 14:50 +0100, Sarah Leclaire wrote:
 Hi,
 
 I would like to calculate the RELATIVE euclidean distance. Is there a 
 function in R which does it ?
 
 (I calculated the abundance of 94 chemical compounds in secretion of 
 several individuals, and I would like to have the chemical distance 
 between 2 individuals as expressed by the relative euclidean distance. 
 Some compounds are in very low abundance whereas others are in high 
 abundance, that's why I would like to correct for the abundance)
 
 Thanks,
 Sarah

A simple solution to this is to transform the data and then compute the
Euclidean distance using dist().

decostand(foo, method = normalize) and

disttransform(foo, method = chord) in package BiodiversityR 

can do this for you without you having to write a function yourself.
Pass the returned object to dist() to get the distances you want.

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permil symbol linux

2011-07-06 Thread Gavin Simpson
On Wed, 2011-07-06 at 12:24 +0200, mathew brown wrote:
 To all in R land,
 
 Here is what I would like to do.
 x = c(1:10)
 y = c(1:10)
 plot(x,y)
 mtext(side=2, line=1.5, expression(*delta*^18*O [permill]), cex=1, 
 adj=0.5)
 
 That's it. Except I would like to replace permill with the symbol.
 Thanks for the help

Are you trying to be difficult? Have you read the posting guide so you
can provide the at minimum information Prof. Ripley asked you for?
Simply repeating the question ad nauseam isn't what was requested.

On my linux box, this will produce a d^{18}O permille **on screen** -
not sure why you want this in square brackets?

plot(1:10, ylab = expression(delta^{18}*O ~ (\u2030)))

Now that glyph doesn't appear to be in the pdf device fonts for my
encoding so when plotting to pdf() I have to add
`encoding=WinAnsi.enc` to my `pdf()` call - IIRC this advice at the
time was supplied by Prof. Ripley. The advice may well be different on
your **unstated** OS - hence the requests for more info.

This is with:

 sessionInfo()
R version 2.13.0 Patched (2011-04-19 r55527)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 

other attached packages:
[1] cocorresp_0.1-9 vegan_1.17-9

which would be part of the at minimum info Prof. Ripley asked for. And
also which device (windows(), x11(), quartz(), pdf(), postscript() etc,
etc.) do you *want* to print on?

Please do read the posting guide - you might not be aware what
information is required to answer your question, so what seems
superfluous may well be, and in this case probably is, *essential*.

G

 
 On Wed, 6 Jul 2011 11:07:55 +0100 (BST)
 Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 
  You still have not sent the information requested in the posting 
  guide, and I do not know your target device nor locale, nor is that a 
  reproducible example.
  
  Please learn some respect for the time of the helpers here (and for 
  all the work that went into making this possible in R).
  
  On Wed, 6 Jul 2011, mathew brown wrote:
  
   Good point.
  
   Here is the code
   plot(dat$timestamp,dat$delta_18_16, ylab=, xlab=(min), tck=0.05, 
   col=blue)
   mtext(side=2, line=1.5, expression(*delta*^18*O [\u0089]), cex=1, 
   adj=0.5)
  
   I'm still not sure how to get your code to work.
  
  Unsurprising, as you have not seen *my* code.
  
   thanks
  
   On Wed, 6 Jul 2011 09:45:49 +0100 (BST)
   Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
  
   On Wed, 6 Jul 2011, mathew brown wrote:
  
   Hi,
  
   I'm trying to figure out how to make a plot with ylab showing the
   permil symbol. Anyone know how to do this?
  
   Yes.
  
   Now, it you would follow the posting guide and give the 'at a minimum'
   information you were asked for, and the graphics device you want to
   use, we might be able to tell you more precisely how.
  
   Hint: that symbol is \u0089, and also in latin-1.
  
   Thanks
  
   --
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
   --
   Brian D. Ripley,  rip...@stats.ox.ac.uk
   Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
   University of Oxford, Tel:  +44 1865 272861 (self)
   1 South Parks Road, +44 1865 272866 (PA)
   Oxford OX1 3TG, UKFax:  +44 1865 272595
  
  
   --
   Mathew Brown
   Institute of Bioclimatology
   University of Göttingen
   Büsgenweg 2
   37077 Göttingen, Germany
   t: +49 551 39 9359
   mathew.br...@forst.uni-goettingen.de
  
  
  -- 
  Brian D. Ripley,  rip...@stats.ox.ac.uk
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R

Re: [R] Standards for delivery of GPL software in CRAN packages

2011-06-27 Thread Gavin Simpson
On Mon, 2011-06-27 at 11:14 -0400, Galkowski, Jan wrote:
 Fine.  Attached. It's waved.
 
 All it has is *.Rd files. Apparently the functions are collected in
 functionINIT.R. But 00Index and DESCRIPTION are not helpful.
 
  - j

The Rd files are the help or manual pages for the functions defined in
the package. In this particular case, the package author has decided to
put all the R code for their package into a single R source file -
functionINIT.R. The other two files are R-specific files, the latter of
which is used to describe the package; which, incidentally, points you
to a peer-reviewed paper that the package is support for.

I don't recall the GPL mentioning anything requiring that the source
code be helpful. The authors have most certainly fulfilled their
requirements under GPL, as has CRAN in distributing the package sources.

Or am I being obtuse and completely missing your point?

G

 -Original Message-
 From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On 
 Behalf Of Barry Rowlingson
 Sent: Monday, June 27, 2011 10:18 AM
 To: Galkowski, Jan
 Cc: r-help@r-project.org
 Subject: Re: [R] Standards for delivery of GPL software in CRAN packages
 
 On Mon, Jun 27, 2011 at 1:24 PM, Galkowski, Jan jgalk...@akamai.com wrote:
  I wondered if there were standard practices in CRAN for delivery of R 
  source implementing functions in R packages. I has encountered a couple of 
  packages where the gzipped version of source contains very little, 
  primarily the Help files describing the functions in the package. In some 
  cases I can find the source as the value of the function name.
 
  Given that these packages are released as GPL, oughtn't the unoptimized 
  source be freely available, hopefully with comments? Am I missing 
  something? Is there a central place other than mirrors where such source is 
  retained? Sourceforge?
 
 
 The 'package source' link on CRAN should point you to a tar.gz file
 that contains the source code. For example, for splancs off the heanet
 mirror it is:
 
 http://ftp.heanet.ie/mirrors/cran.r-project.org/src/contrib/splancs_2.01-27.tar.gz
 
  .tar.gz files from those links should have full R, C and Fortran source code.
 
 I think we need counter-examples...
 
 Barry
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Setting up a State Space Model in dlm

2011-06-10 Thread Gavin Simpson
On Tue, 2011-06-07 at 17:24 +0100, Michael Ash wrote:
 This question pertains to setting up a model in the package dlm
 (dynamic linear models,
 http://cran.r-project.org/web/packages/dlm/index.html

The author of the dlm package has just published a paper on state space
models in R including details on setting up dlm:

http://www.jstatsoft.org/v41/i04

That might help with your question - I haven't seen a reply on list, but
am unable to help answer it either.

HTH

G

 I have read both the vignette and An R Package for Dynamic Linear
 Models (http://www.jstatsoft.org/v36/i12/paper), both of which are
 very helpful.  There is also some discussion at
 https://stat.ethz.ch/pipermail/r-help/2009-May/198463.html
 
 I have what I think is a relatively straightforward state-space model
 but am unable to translate it into the terms of dlm.   It would be
 very helpful to get a basic dlm setup for the problem and I would
 guess that I can then modify it with more lags, etc., etc.
 
 The main equation is
 pi[t] = a * pi[t-1] + b*(U[t] - UN[t]) + e[t]
 
 (see 
 http://chart.apis.google.com/chart?cht=txchl=%5Cpi_t=a%5Cpi_{t-1}%2bb%28U_t-U^N_{t}%29%2Be_t
 for a pretty version)
 
 with pi and U observed, a and b fixed coefficients, and e a
 well-behaved error term (gaussian, say, variance unknown).
 The object of interest is the unobserved and time-varying component UN
 which evolves according to
 
 UN[t] = UN[t-1] + w[t]
 
 (see 
 http://chart.apis.google.com/chart?cht=txchl=U%5EN_%7Bt%7D%20=%20U%5EN_%7Bt-1%7D%20%2B%20%5Cepsilon_t
 for a pretty version)
 that is, a random walk with well-behaved error term with var(w) known
 (or assumed).
 
 I'm interested in the estimates of a and b and also in estimating the
 time series of UN.
 
 Note that the term b*(U[t] - UN[t]) makes this a nonlinear model.
 
 Below is code that does not work as expected.  I see the model as
 having four parameters, a, b, var(e), and UN.  (Or do I have a
 parameter UN[t] for every period?)
 
 I do not fully understand the dlm syntax.  Is FF specified properly?
 What should X look like?  How does m0 relate to parm()?
 
 
 I would be grateful if someone would be willing to glance at the code.
  Thanks. Michael
 
 library(quantmod)
 library(dlm)
 ## Get and organize the data
 getSymbols(UNRATE,src=FRED)  ## Unemployment rate
 getSymbols(GDPDEF,src=FRED)  ## Quarterly GDP Implicit Price Deflator
 u - aggregate(UNRATE,as.yearqtr,mean)
 gdpdef - aggregate(GDPDEF,as.yearqtr,mean)
 pi - diff(log(gdpdef))*400
 pilag - lag(pi,-1)
 tvnairu - cbind(pi,pilag,u)
 tvnairu.df - subset(data.frame(tvnairu), !is.na(pi)  !is.na(u) 
 !is.na(pilag))
 
 
 ## First attempt
 buildNAIRU - function(x) {
   modNAIRU - dlm(FF=t(matrix(c(1,1,1,0))),
   GG=diag(4),
   W=matrix(c(0,0,0,0,  0,0,0,0,  0,0,0.04,0,  0,0,0,0),4,4),
   V=exp(x[4]), m0=rep(0,4), C0=diag(1e07,4),
   JFF = t(matrix(c(1,1,0,0))),
   X=cbind( tvnairu.df$pilag, tvnairu.df$u))
   return(modNAIRU)
 }
 
 (fitNAIRU - dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU,
 hessian=TRUE, control=list(maxit=500)))
 (dlmNAIRU - buildNAIRU(fitNAIRU$par))
 
 
 ## Second attempt
 buildNAIRU - function(x) {
   modNAIRU - dlm(FF=t(matrix(c(1,1,0,0))),
   GG=diag(4),
   W=matrix(c(0,0,0,0, 0,0,0,0, 0,0,0.04,0, 0,0,0,0 ),4,4),
   V=exp(x[4]), m0=c(0.7,-0.3,5,1), C0=diag(100,4),
   JFF = t(matrix(c(1,1,0,0))),
   X=cbind( tvnairu.df$pilag, tvnairu.df$u - x[3] ))
   return(modNAIRU)
 }
 
 (fitNAIRU - dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU,
 hessian=TRUE, control=list(maxit=500)))
 (dlmNAIRU - buildNAIRU(fitNAIRU$par))
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] metaMDS and envfit: Help reading output

2011-05-23 Thread Gavin Simpson
On Wed, 2011-05-11 at 07:07 -0400, Michael Denslow wrote:
 Hi Katie,
 
 On Tue, May 10, 2011 at 4:51 PM, Songer, Katherine B - DNR
 katherine.son...@wisconsin.gov wrote:
  Hello R experts,
 
  I've used metaMDS to run NMDS on some fish abundance data, and am also 
  working on correlating environmental data to the NMDS coordinates. I'm 
  fairly new to metaMDS and NMDS in general, so I have what are probably some 
  very basic questions. My fish abundance data consists of 66 sites for which 
  up to 20 species of fish were identified and counted. I ran metaMDS on this 
  data in 3 dimensions (after using a scree plot to check for stress levels 
  in the different dimensions). I then used envfit to correlate a predictor 
  dataset of environmental variables with the NMDS results, using the 
  following code.
 
 Fish-as.data.frame(read.csv(Fish.csv,header=TRUE, sep = ,))
 Fish.mds-metaMDS(Fish,zerodist = add,k=3,trymax=20)
 
 Predictors-as.data.frame(read.csv(Predictors.csv,header=TRUE, sep = ,))
 Fish.fit - envfit(Fish.mds$points, Predictors, k=3, 1000, na.rm = TRUE)
 
 Have a look at the choices argument in envfit, etc. This is how you
 specify which axes you want to plot.
 
 ord - metaMDS(varespec, k=3)
 fit - envfit(ord, varechem,
   perm = 999, choices=c(1:3))

Indeed, this is an essential step because we probably shouldn't
interpret the axes of nMDS as separate components a la PCA/CA...

 fit
 plot(ord, choices=c(1,3))
 plot(fit, choices=c(1,3))

...as such, a 3-d plot might be better (for some definition of
better). See ?ordirgl

G

 Fish.fit
 
  The output of Fish.fit was as follows (table truncated):
 
 Dim1Dim2r2  Pr(r)
  DrainArea   -0.5923233  -0.8057004  0.7674  0.000999 ***
  Flow-0.5283236  -0.8490431  0.7847  0.000999 ***
  StrmWidth   -0.6993457  -0.7147836  0.6759  0.000999 ***
  Gradient0.4541225   0.8909392   0.2085  0.010989 *
 
  I'd like to better understand how to read this table. I understand that 
  Dim1 and Dim2 refer to the dimensions of the vectors produced by envfit, 
  and r2 is the r-squared of those vectors. But how do I visualize these 
  vectors in a 3-d plot? To which of the 3 NMDS dimensions are these vectors 
  being correlated? Is there code to produce the x, y, and z coordinates of 
  each of the sites in Fish.mds?
 
  Thanks very much.
  Katie
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Princurve

2011-05-23 Thread Gavin Simpson
On Thu, 2011-05-19 at 06:43 -0700, guy33 wrote:
 Hey all,
 
 I can't seem to get the princurve package to produce correct results, even
 in the simplest cases.  For example, if you just generate a 1 period
 noiseless sine wave, and ask for the principal curve and plot, the returned
 curve is clearly wrong (doesn't follow the sine wave).  Here's my code:
 
 library(princurve)
 x - runif(1000,0,2*pi); x - cbind(x/(2*pi), sin(x))
 fit1 - principal.curve(x, plot = TRUE)
 
 Anyone have any suggestions?  If you run this code, do you get the correct
 principal curve?

How about specifying some useful starting points?

fit1 - principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100,
start = cbind(sort(x[,1]), rep(1, nrow(x

And we need a few more iterations before convergence here. Starting from
the first principal component for example might give useful starting
points.

HTH

G

 Any help would be really appreciated!
 -guy33 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3535721.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Princurve

2011-05-23 Thread Gavin Simpson
On Mon, 2011-05-23 at 09:58 -0400, Ravi Varadhan wrote:
 Why does this not find a better solution?
 
  x - seq(0,2*pi, length=1000)
  x - cbind(x/(2*pi), sin(x))
 
  fit1 - principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100,
 + start = cbind(sort(x[,1]), rep(1, nrow(x
 Starting curve---distance^2: 1499.5
 Iteration 1---distance^2: 3.114789
 Iteration 2---distance^2: 10.04492
 Iteration 3---distance^2: 11.89215
 Iteration 4---distance^2: 12.43235
 Iteration 5---distance^2: 12.68524
 Iteration 6---distance^2: 12.84443
 Iteration 7---distance^2: 12.93624
 Iteration 8---distance^2: 12.99118
 Iteration 9---distance^2: 13.01280
 Iteration 10---distance^2: 13.02867
 Iteration 11---distance^2: 13.03867
 
 
 You see that the projection distance is minimal at iteration 1, but
 the algorithm settles for an inferior projection (i..e. a greater
 projection distance).

I hadn't quite noticed that - I was too busy watching the algorithm
converge to some solution instead.

Hastie and Stuetzle make no claims about the algorithm converging or
indeed even that each step leads to a decrease in the criterion (sum of
squared distances to the curve).

It would appear that a smoother curve than the one identified in iter 1
is preferable when one considers the self-consistency check; smoothing
in each dimension using the distance along the current curve as a
predictor performs local averaging and the new principal curve
identified starts cutting the corners off the sine wave. gradually the
new curve derived by local averaging and the current curve are
sufficiently similar that the curve is declared self-consistent and the
algorithm stops.

I can achieve a better fit using my own wrappers around
principal.curve() and the smooth.spline() function used in the local
averaging, but only if I supply some good starting values. For my
wrapper and the starting configuration I suggested, I get a very close
solution to the sine wave original but it is a bit smoother - the rapid
change around the peak/troughs is smoothed out a little bit. GCV is
being used at each iteration in my wrappers to find the optimal
smoothness for smooths in both dimensions that describe the curve; the
solution fits the data almost as well as the best curve, but uses fewer
degrees of freedom in doing so.

As Henrik suggests, playing with the parameters of the smooth.spline
fitting engine helps - as that is the main thing my wrapper function
makes easier.

These data seem quite difficult for the algorithm to get good answers -
starting from the first or second principal component gives poor results
and a highly complex fitted curve that traverses back and forth across
the gaps between the peaks/troughs in the sine wave several times.

G

 If I do not provide any starting values, I get this:
 
  fit1 - principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100)
 Starting curve---distance^2: 29692.03
 Iteration 1---distance^2: 20.31220
 Iteration 2---distance^2: 19.45939
 Iteration 3---distance^2: 19.26387
 Iteration 4---distance^2: 19.20626
 Iteration 5---distance^2: 19.18666
 Iteration 6---distance^2: 19.18059
 
 
 This is even worse.  It seems like the algorithm is quite sensitive to 
 starting values.  Is this behavior expected or is there some flaw in the 
 algorithm?
 
 Ravi. 
 
 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
 Of Gavin Simpson [gavin.simp...@ucl.ac.uk]
 Sent: Monday, May 23, 2011 8:27 AM
 To: guy33
 Cc: r-help@r-project.org
 Subject: Re: [R] Problem with Princurve
 
 On Thu, 2011-05-19 at 06:43 -0700, guy33 wrote:
  Hey all,
 
  I can't seem to get the princurve package to produce correct results, even
  in the simplest cases.  For example, if you just generate a 1 period
  noiseless sine wave, and ask for the principal curve and plot, the returned
  curve is clearly wrong (doesn't follow the sine wave).  Here's my code:
 
  library(princurve)
  x - runif(1000,0,2*pi); x - cbind(x/(2*pi), sin(x))
  fit1 - principal.curve(x, plot = TRUE)
 
  Anyone have any suggestions?  If you run this code, do you get the correct
  principal curve?
 
 How about specifying some useful starting points?
 
 fit1 - principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100,
 start = cbind(sort(x[,1]), rep(1, nrow(x
 
 And we need a few more iterations before convergence here. Starting from
 the first principal component for example might give useful starting
 points.
 
 HTH
 
 G
 
  Any help would be really appreciated!
  -guy33
 
  --
  View this message in context: 
  http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3535721.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained

Re: [R] R forum for only Statistics

2011-05-07 Thread Gavin Simpson
On Thu, 2011-04-28 at 05:41 -0700, Jeremy Hetzel wrote:
 Vincy,
 
 In addition to the R-help mailing list, other forums for R and statistics 
 that I go to include:
 
 Stackoverflow's R tagged questions
 http://stackoverflow.com/questions/tagged/r

SO is *not* meant for statistical Q's - R or otherwise. People posting
such questions there will usually get short shrift, and find their Qs
closed or migrated to the Crossvalidated sister site that you list below
stats.stackexchange.com.

SO is for programming.

HTH

G

 StackExchange's Stats site:
 http://stats.stackexchange.com/
 
 A Google search for R statistics forum shows a few other forums that I'm not 
 familiar with.  It also shows a previous post to this list on a similar 
 topic: http://r.789695.n4.nabble.com/Statistics-forums-td873818.html , 
 where, at least in 2008, David Winsemius recommended the three sci.stat.* on 
 USENET, which are now also on Google Groups (
 https://groups.google.com/forum/#!forumsearch/sci.stat.*).  Perhaps others 
 have a better feel for which forums are of higher quality.  The sci.stat.* 
 groups look a bit overrun with spam these days.
 
 Jeremy
 
 
 
 On Thursday, April 28, 2011 7:26:06 AM UTC-4, Vincy Pyne wrote:
 
  Hi!
 
  I wish to know if there is any R forum which is meant only for Statistics? 
  I mean where we can clarify our statistics doubts and seek knowledge. I 
  know 
  there are lot many books and internet sites, but 'R forum' has altogether 
  different standard and very high level and one can learn a lot from them.
 
  Regards
 
  Vincy
 
 
  [[alternative HTML version deleted]]
 
  __
  r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 __ R-help@r-project.org mailing 
 list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting 
 guide http://www.R-project.org/posting-guide.html and provide commented, 
 minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using $ accessor in GAM formula

2011-05-06 Thread Gavin Simpson
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] principal components

2011-04-04 Thread Gavin Simpson
On Fri, 2011-04-01 at 11:52 +0530, nuncio m wrote:
 HI all,
  I am trying to compute the EOF of a matrix using prcomp but unable to get
 the expansion co-efficients.
 is it possible using prcomp or are there any other methods
 thanks
 nuncio
 

*sigh*

 RSiteSearch(EOF)

It is at times like this that an R equivalent of:

http://lmgtfy.com/

would be handy ;-)

Or it slightly cruder cousin...

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quick recode of -999 to NA in R

2011-03-30 Thread Gavin Simpson
On Wed, 2011-03-30 at 08:15 -0500, Christopher Desjardins wrote:
 Hi,
 I am trying to write a loop to recode my data from -999 to NA in R. What's
 the most efficient way to do this? Below is what I'm presently doing, which
 is inefficient. Thanks,
 Chris
 
 
dat0 - read.table(time1.dat)

dat0 - read.table(time1.dat, na.strings = c(NA,-999))

would seem the easiest option.

Do read ?read.table for details.

HTH

G

 colnames(dat0) - c(e1dq, e1arcp, e1dev, s1prcp, s1nrcp, s1ints,
 a1gpar, a1pias, a1devt)
 
 dat0[dat0$e1dq==-999.,e1dq] - NA
 
 dat0[dat0$e1arcp==-999.,e1arcp] - NA
 
 dat0[dat0$e1dev==-999.,e1dev] - NA
 
 dat0[dat0$s1prcp==-999.,s1prcp] - NA
 
 dat0[dat0$s1nrcp==-999.,s1nrcp] - NA
 
 dat0[dat0$s1ints==-999.,s1ints] - NA
 
 dat0[dat0$a1gpar==-999.,a1gpar] - NA
 
 dat0[dat0$a1pias==-999.,a1pias] - NA
 
 dat0[dat0$a1devt==-999.,a1devt] - NA
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dirichlet surface

2011-03-30 Thread Gavin Simpson
On Wed, 2011-03-30 at 09:55 -0700, Kehl Dániel wrote:
 Dear David,
 
 I think that is a small bug too, maybe because the function is constant?
 is there a nice way to put the c(0,2.1) argument optionally, only if all 
 the parameters are 1?
 Should I post the problem somewhere else (developers maybe?)

I don't think this is a bug really; the code is having to compute limits
of the z axis and you supplied it one bit of information: a 2. If your
data are so degenerate then it is not unreasonable to expect some user
intervention. Admittedly, persp doesn't seem to work like other R
plotting functions.

You could do something like:

persp(x1, x2, z, 
  zlim = if(length(na.omit(unique(as.vector(z  2){ c(0,2.1) }
else { NULL})

in your call to persp so it only uses user-defined limits if the number
of numeric values in z is less than 2.

HTH

G

 thanks:
 Daniel
 
 2011-03-30 04:42 keltezéssel, David Winsemius írta:
 
  On Mar 29, 2011, at 4:45 PM, Kehl Dániel wrote:
 
  Dear list members,
 
  I want to draw surfaces of Dirichlet distributions with different 
  parameter settings.
  My code is the following:
  #begin code
  a1 - a2 - a3 - 2
  #a2 - .5
  #a3 - .5
  x1 - x2 - seq(0.01, .99, by=.01)
 
  f - function(x1, x2){
   term1 - gamma(a1+a2+a3)/(gamma(a1)*gamma(a2)*gamma(a3))
   term2 - x1^(a1-1)*x2^(a2-1)*(1-x1-x2)^(a3-1)
   term3 - (x1 + x2  1)
   term1*term2*term3
   }
 
  z - outer(x1, x2, f)
  z[z=0] - NA
 
  persp(x1, x2, z,
   main = Dirichlet Distribution,
   col = lightblue,
   theta = 50,
   phi = 20,
   r = 50,
   d = 0.1,
   expand = 0.5,
   ltheta = 90,
   lphi = 180,
   shade = 0.75,
   ticktype = detailed,
   nticks = 5)
  #end code
 
  It works fine (I guess), except for a1=a2=a3=1. In that case I get 
  the error: Error in persp.default...  invalid 'z' limits.
  The z matrix has only elements 2 and NA.
 
  Might be a buglet in persp. If you set the zlim argument to c(0,2.1), 
  you get the expected constant plane at z=2 for those arguments.
 
  Any ideas are appreciated.
 
  Thank you:
  Daniel
  University of Pécs
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dirichlet surface

2011-03-30 Thread Gavin Simpson
On Wed, 2011-03-30 at 11:12 -0700, Kehl Dániel wrote:
 Actually, it works for the a=1 case, not for the others. It still gives 
 the invalid 'zlim argument' error.
 I'll try to work it out maybe instead of NULL giving a c which is 
 dependent on the max(z).

Sorry, I misread the helpfile - the other lims can be NULL but not zlim.
This works:

persp(x1, x2, z, 
  zlim = if(length(na.omit(unique(as.vector(z  2) { 
 c(0,2.1)
 } else { 
 range(z, na.rm = TRUE) 
 })

HTH

G

 Daniel
 
 2011-03-30 10:42 keltezéssel, Kehl Dániel írta:
  It helped a lot indeed, thank you very much!
  Now I understand why it was a problem for persp!
 
  Daniel
 
  2011-03-30 10:31 keltezéssel, Gavin Simpson írta:
  On Wed, 2011-03-30 at 09:55 -0700, Kehl Dániel wrote:
  Dear David,
 
  I think that is a small bug too, maybe because the function is 
  constant?
  is there a nice way to put the c(0,2.1) argument optionally, only if 
  all
  the parameters are 1?
  Should I post the problem somewhere else (developers maybe?)
  I don't think this is a bug really; the code is having to compute limits
  of the z axis and you supplied it one bit of information: a 2. If your
  data are so degenerate then it is not unreasonable to expect some user
  intervention. Admittedly, persp doesn't seem to work like other R
  plotting functions.
 
  You could do something like:
 
  persp(x1, x2, z,
 zlim = if(length(na.omit(unique(as.vector(z  2){ c(0,2.1) }
  else { NULL})
 
  in your call to persp so it only uses user-defined limits if the number
  of numeric values in z is less than 2.
 
  HTH
 
  G
 
  thanks:
  Daniel
 
  2011-03-30 04:42 keltezéssel, David Winsemius írta:
  On Mar 29, 2011, at 4:45 PM, Kehl Dániel wrote:
 
  Dear list members,
 
  I want to draw surfaces of Dirichlet distributions with different
  parameter settings.
  My code is the following:
  #begin code
  a1- a2- a3- 2
  #a2- .5
  #a3- .5
  x1- x2- seq(0.01, .99, by=.01)
 
  f- function(x1, x2){
term1- gamma(a1+a2+a3)/(gamma(a1)*gamma(a2)*gamma(a3))
term2- x1^(a1-1)*x2^(a2-1)*(1-x1-x2)^(a3-1)
term3- (x1 + x2  1)
term1*term2*term3
}
 
  z- outer(x1, x2, f)
  z[z=0]- NA
 
  persp(x1, x2, z,
main = Dirichlet Distribution,
col = lightblue,
theta = 50,
phi = 20,
r = 50,
d = 0.1,
expand = 0.5,
ltheta = 90,
lphi = 180,
shade = 0.75,
ticktype = detailed,
nticks = 5)
  #end code
 
  It works fine (I guess), except for a1=a2=a3=1. In that case I get
  the error: Error in persp.default...  invalid 'z' limits.
  The z matrix has only elements 2 and NA.
  Might be a buglet in persp. If you set the zlim argument to c(0,2.1),
  you get the expected constant plane at z=2 for those arguments.
  Any ideas are appreciated.
 
  Thank you:
  Daniel
  University of Pécs
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ordination in vegan

2011-03-28 Thread Gavin Simpson
On Sun, 2011-03-27 at 21:40 -0700, Akane Nishimura wrote:
 Hi all,

Hi Akane,

I answered this on the R-forge forum where you first posted it. I have
posted a follow-up there too. More below...

 I have site data with plant species cover and am looking for trends. I'm
 kind of new to this, but have done lots of reading and can't find an answer.
 
 I tried decorana (I know it's been replaced by ca.) and see a trend, but I'm
 not sure what it means. Is there a way to get the loadings/eigenvectors of
 the axes (like in PCA)? Is there a way to do this with rda() too? How about
 with cca()?  I know I can do scores() or summary() but that's not exactly
 the info I need.  I basically want to interpret the axes.
 
 Also after doing envfit() and plotting, is there a way to label sites based
 on certain environmental factors?  For instance, I want to highlight plots
 that have a certain treatment (or have different colored points for the
 different treatments.  
 
 
 
 I don't know if it matters but I used downweight() before applying
 decorana().

Here is a copy of the response on the Vegan forum:

There are *two* sets of scores, species and sites (samples). The
terminology in ordination in ecology is a little different to that of
PCA with scores and loadings. Think of the site (sample) scores as the
scores and the species scores as the loadings in a DCA. Also, the
species scores represent the fitted optima of each species in the
ordination space.

If the above doesn't make any sense, consider a good text on ordination
in ecology, such as Legendre and Legendre (1998) Numerical Ecology 2nd
English Edition, or Jongman et al (1995) Data Analysis in Community and
Landscape Ecology.

As for envfit(), like I said, you need to build the plot up by hand.
Here is an example:

data(varespec)
data(varechem)
library(MASS)
ord - metaMDS(varespec)
fit - envfit(ord, varechem, perm = 999)
## create two groups to represent two treatments
## you'd use your actual Treatment variable, preferably as a factor
Treatment - gl(2, NROW(varespec) / 2)

## build up plot
plot(ord, type = n)
points(ord, display = species)
## Treatment 1 in blue, Treatment 2 in red
points(ord, display = sites, col = c(blue,red)[Treatment])
plot(fit, add = TRUE)

Does that help?

As for downweight(), it isn't relevant really - you need to know what it
has done, but that is really just to reduce to weight of rare species in
the data set. For the definition of rare see the help page or the code
for the function - I can't remember what it is off the top of my head.

HTH

G

  
 
 down -downweight(spp2009bttl, fraction = 5)
 
 down.dca - decorana(down)
 
 scores(down.dca)
 
 summary(down.dca)
 
 
 
 Thanks so much!
 
  
 
 -Akane
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exploring dist()

2011-03-20 Thread Gavin Simpson
On Fri, 2011-03-18 at 06:21 -0700, bra86 wrote:
 Hello, everybody, 
 
 I hope somebody could help me with a dist() function.
 I have a data frame of size 2*4087 (col*row), where col corresponds to the
 treatment and rows are

So you have 4087 species? If yes, normally, you'd have the species in
the columns and the samples/treatments in the row.

 species, values are Hellinger distances, I should reconstruct a distance
 matrix

This doesn't make sense - distances would mean you have a square
symmetric matrix but 2 * 4087 isn't square. Do you mean you have
Hellinger **transformed** the data such that when you take the Euclidean
distances of this transformed data you get the Hellinger distance rather
than the Euclidean distance?

If yes - and you sort the rows/columns issue - R wants the samples in
rows - then it is reasonably simple.

Here is a much simplified example with 5 species and 4 samples:

dat - data.frame(runif(4, 1, 10), runif(4, 2, 10), runif(4, 4, 20),
  runif(4, 1, 4), runif(4, 0, 5))
names(dat) - paste(spp, LETTERS[1:5])
rownames(dat) - paste(samp, 1:4)

So we have data that looks like this:

 dat
  spp Aspp B spp Cspp D spp E
samp 1 6.974237 7.933403  5.460453 3.975219 4.6818142
samp 2 1.049801 6.751013 14.143798 1.777532 4.0261914
samp 3 5.742314 2.243850 15.613524 3.476935 0.4144043
samp 4 5.985012 9.576440  8.722579 3.411262 1.8126338

Then I apply a Hellinger transformation:

require(vegan)
datH - decostand(dat, method = hellinger)

So at this point we have something that I think you are telling us you
have:

 datH
   spp A spp B spp C spp D spp E
samp 1 0.4901864 0.5228086 0.4337378 0.3700782 0.4016244
samp 2 0.1945069 0.4932488 0.7139447 0.2530989 0.3809156
samp 3 0.4570334 0.2856942 0.7536245 0.3556336 0.1227769
samp 4 0.4503635 0.5696823 0.5436922 0.3400073 0.2478481

We can use dist() on this data frame via:

dij - dist(datH)

If we look at the object created, we see the **printed** representation
of the dissimilarity matrix, which is a 4*4 matrix in this example:

 dij
  samp 1samp 2samp 3
samp 2 0.4253576
samp 3 0.4874570 0.4367179  
samp 4 0.2010581 0.3543312 0.3750363

Note that the diagonal and the upper triangle of the matrix are not
printed, or stored even, because they are trivial (0 for all diagonals
and the upper triangle is the same as the lower triangle).

dist() actually creates a vector of numbers that will fill the lower
triangle of the dissimilarity matrix. This saves on storage space. If
you want the add the diagonal and upper triangle, we can get it one of
two ways:

1) dist(datH, diag = TRUE, upper = TRUE)

2) as.matrix(dij)

However only the second actually returns a matrix with 16 numbers, the
former still only computes the 6 pair-wise distances, but when
**printed** it shows the full matrix.

If you really have species in rows and smaples in columns, then you can
transpose your matrix, e.g.

datH.t - t(dat.H)

and then compute the dissimilarity matrix as above.

Does this help?

G

 with a dist() function. I know that euclidean method should be used.
 
 When I type:
 dist(dframe,euclidean)
 it gives me a truncated table, where values are missing.
 
 I suppose that I have to define something for the values,
 but I have no idea what exactly, because I am not familiar with r at all.
 
 I would be very appreciated for every kind of suggestions or tips.
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/exploring-dist-tp3387187p3387187.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] persuade tabulate function to count NAs in a data frame

2011-03-19 Thread Gavin Simpson
 and any attached files are confidential and/...{{dropped:19}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with vegan package instalation on linux

2011-03-09 Thread Gavin Simpson
On Wed, 2011-03-09 at 11:41 -0300, Ronaldo Reis-Jr. wrote:
 Hi,
 
 I try to install vegan package on debian linux and I give this error:

Have you got a fortran compiler installed on your system? gcc-gfortran
is the package on my Fedora laptop.

G

 =
 * installing *source* package 'vegan' ...
 ** libs
 make: *** No rule to make target `cepin.o', needed by `vegan.so'.  Stop.
 *** arch - Rgnome
 Warning in file(con, r) :
   cannot open file '/usr/lib/R/etc/Rgnome/Makeconf': No such file or
 directory
 Error in file(con, r) : cannot open the connection
 * removing '/usr/local/lib/R/site-library/vegan'
 
 The downloaded packages are in
 '/tmp/RtmpYgJz5c/downloaded_packages'
 Warning message:
 In install.packages(vegan) :
   installation of package 'vegan' had non-zero exit status
 ==
 
 I try to find the problem without success.
 
 Anybody can help-me?
 
 Thanks
 Ronaldo
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rioja package, creating transfer function, WA, Error in FUN

2011-03-02 Thread Gavin Simpson
On Thu, 2011-02-10 at 09:40 -0800, mdc wrote:
 Hi, I am a new R user and am trying to construct a palaeoenvironmental
 transfer function (weighted averaging method) using the package rioja.
 I've managed to insert the two matrices (the species abundance and the
 environmental data) and have assigned them to the y and x values
 respectively. When I try and enter the 'WA' function though, I get an 'Error
 in FUN' message (see below for full values). Alas, I do not know what this
 means and have struggled to find similar problems to this online. Is there a
 step I've missed out between assigning the matrices and the WA function?  
 
  SWED=odbcConnectExcel(file.choose())   (SWED is the environmental data
  file)
  sqlTables(SWED)
  Env=sqlFetch(SWED, Sheet1)
  odbcClose(SWED)
  Env
 
SampleId WTD  Moisture   pH EC 
 1  N1_1   20 91.72700 3.496674  85.02688
 2  N1_22 93.88913 3.550794  85.69465
 3  N1_3   26 90.30269 3.948559 113.19206
 4  N1_45 94.14427 3.697213  48.56375
 5  N1_5   30 90.04269 3.745020 108.57278
 
 90 GAL_15 70 94.07849 3.777932  66.77673

That's your problem, the odbc stuff has read the data in as characters.
CSV would be a lot simpler, just save your excel sheets as CSV files and
read them in with:

Env - read.csv(my_excel_sheet.csv, row.names = 1)

etc... where my_excel_sheet.csv is the name of your saved csv file or
just use:

Env - read.csv(file.choose(), row.names = 1)

if finding files via the GUI is helpful to you.

It is odd that the species data set has been read in OK though - I say
OK, but you still need to get the F1 column out of the species data and
set it as the row names of your data.

Sorry I'm coming to this late; I've been away and not really following
the list for a few weeks.

If you can't get things working, contact me off list and send the Excel
files and I'll send back a script that will load the files and do the WA
for you to look at.

HTH

G

 
  STEST=odbcConnectExcel(file.choose())
  sqlTables(STEST)  (STEST is the
  species abundance file)
  Spe=sqlFetch(STEST, Sheet8)
  odbcClose(STEST)
  Spe
 
 (The species data contains the abundance of 32 species over 90 sites, set
 out like this)
F1AmpFlavAmpWri  ArcCat   ArcDis 
 1N1_1 22.2929936 0.000  0.000  0.000
 2N1_2 30.9677419 0.000  0.000  3.2258065
 
  library(rioja)
  y -as.matrix(Spe)
  x -as.matrix(Env)
 
  WA(y, x, tolDW = FALSE, use.N2=TRUE, check.data=TRUE, lean=FALSE)(the
  command from the WA section of the rioja booklet)
 Error in FUN(newX[, i], ...) : invalid 'type' (character) of argument
 
 
 Any help would be most appreciated, 
 Best wishes, 
 Matthew

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] different results in MASS's mca and SAS's corresp

2011-02-07 Thread Gavin Simpson
On Sat, 2011-02-05 at 23:39 -0600, Paul Johnson wrote:
 On Sat, Feb 5, 2011 at 9:19 AM, David Winsemius dwinsem...@comcast.net 
 wrote:
snip /
  cbind(scalermca[,1] * 0.827094, scalermca[,2] *  -0.7644828)
   [,1][,2]
 1   1.06070017 -0.8154
 2   0.77057891  0.63456780
 3   1.07031764 -1.30675217
 4   1.07031764 -1.30675217
 5   0.23075886  0.90002547
 6   0.6943  0.60993995
 7   0.10530240  0.78445402
 8  -0.27026650  0.44225049
 9   0.13426089  1.15670532
 10  0.11861965  0.64778456
 11  0.23807570  1.21775202
 12  1.01156703 -0.01927226
 13  0.28051938 -0.59805897
 14 -1.17343686 -0.27122981
 15 -0.83838041 -0.64003061
 16 -0.05453708 -0.22925816
 17 -0.91732401 -0.49899374
 18 -0.92694148 -0.00774156
 19 -1.30251038 -0.34994509
 20 -1.30251038 -0.34994509
 
 So, that does reproduce SAS exactly.  And I'm a little frustrated I
 can't remember the matrix command to get that multiplication done
 without cbinding the 2 columns together that way.

You might have been thinking of sweep():

sweep(scalermca[,1:2], 2, c(0.827094,-0.7644828), *)

snip/

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAM quasipoisson in MuMIn

2011-02-04 Thread Gavin Simpson
On Fri, 2011-02-04 at 12:31 +, Karen Moore wrote:
 Hi,
 
snip /
 My formula is
 
 GAM-gam(Species richness (count) ~ Categorical +  Continous + Continous + *
 s*(Continous ) + Continous : Continous + Continous : Continous,
 family=quasipoisson, data =)

Ok, I'm reasonably certain that that is *not* your gam() call. Why not
post the *actual* code you would enter for your GAM model?

dd - dredge(GAM)

should evaluate all possible models from the global model you specified.
The example you quote is for a global model with both smooth and linear
terms for the same variables, hence in dredge() they need to select
models that include only the smooth for a particular variable or the
linear term for that variable, but not both.

By the looks of it, you don't have this problem.

G

 
 Thanks for any advice on script
 
 
 Karen
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Please stop all e-mail

2011-02-04 Thread Gavin Simpson
On Fri, 2011-02-04 at 13:06 +0800, cahyo kristiono wrote:
 Dear r-help
 
 I don't want to receive againevery e-mail about [R] in my address e-mail 
 (cahyo_kristi...@yahoo.com), because it is cause my inbox so full quickly. So 
 I 
 need your help to stop every e-mail about [R] in my address.

Remove yourself or change your subscription options  using the web
interface:

https://stat.ethz.ch/mailman/listinfo/r-help

G

 Thank you so much
 
 Regards
 CK
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] agnes clustering and NAs

2011-01-28 Thread Gavin Simpson
On Fri, 2011-01-28 at 10:00 +1100, Dario Strbenac wrote:
 Hello,
 
 Yes, that's right, it is a values matrix. Not a dissimilarity matrix.
 
 i.e.
 
  str(iMatrix)
  num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ...
  - attr(*, dimnames)=List of 2
   ..$ : NULL
   ..$ : chr [1:56] -8100 -7900 -7700 -7500 ...
 
 For the snippet of checking for NAs, I get all TRUEs, so I have at least one 
 NA in each column.

Sorry, my bad. Try this:

apply(iMatrix, 1, function(x) all(is.na(x)))

will check that you have no fully `NA` rows.

Also look at str(iMatrix) for potential problems.

Finally, try:

out - dist(iMatrix)
any(is.na(out))

should repeat what agnes is doing to compute the dissimilarity matrix.
If that returns TRUE, go and find which samples are giving NA
dissimilarity and why.

The issue is not NA in the input data, but that your input data is
leading to NA in the computed dissimilarities. This might be due to NA's
in your input data, where a pair of samples has no common set of data
for example.

HTH

G

 The part of the agnes documentation I was referring to is :
 
 In case of a matrix or data frame, each row corresponds to an observation, 
 and each column corresponds to a variable. All variables must be numeric.  
 Missing values (NAs) are allowed.
 
 So, I'm under the impression it handles NAs on its own ?
 
 - Dario.
 
  Original message 
 Date: Thu, 27 Jan 2011 12:53:27 +
 From: Gavin Simpson gavin.simp...@ucl.ac.uk  
 Subject: Re: [R] agnes clustering and NAs  
 To: Uwe Ligges lig...@statistik.tu-dortmund.de
 Cc: d.strbe...@garvan.org.au, r-help@r-project.org
 
 On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
  
  On 27.01.2011 05:00, Dario Strbenac wrote:
   Hello,
  
   In the documentation for agnes in the package 'cluster', it says that 
   NAs are allowed, and sure enough it works for a small example like :
  
   m- matrix(c(
   1, 1, 1, 2,
   1, NA, 1, 1,
   1, 2, 2, 2), nrow = 3, byrow = TRUE)
   agnes(m)
   Call:agnes(x = m)
   Agglomerative coefficient:  0.1614168
   Order of objects:
   [1] 1 2 3
   Height (summary):
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  1.155   1.247   1.339   1.339   1.431   1.524
  
   Available components:
   [1] order  height ac merge  diss   call   method data
  
   But I have a large matrix (23371 rows, 50 columns) with some NAs in it 
   and it runs for about a minute, then gives an error :
  
   agnes(iMatrix)
   Error in agnes(iMatrix) :
  No clustering performed, NA-values in the dissimilarity matrix.
  
   I've also tried getting rid of rows with all NAs in them, and it still 
   gave me the same error. Is this a bug in agnes() ? It doesn't seem to 
   fulfil the claim made by its documentation.
  
  
  I haven't looked in the file, but you need to get rid of all NA, or in 
  other words, all rows that contain *any* NA values.
 
 If one believes the documentation, then that only applies to the case
 where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
 data matrix or data frame.
 
 The only way the OP could have gotten that error with the call shown is
 if iMatrix were not a dissimilarity matrix inheriting from class dist,
 so `NA`s should be allowed.
 
 My guess would be that the OP didn't get rid of all the `NA`s.
 
 Dario: what does:
 
 sapply(iMatrix, function(x) any(is.na(x)))
 
 or if iMatrix is a matrix:
 
 apply(iMatrix, 2, function(x) any(is.na(x)))
 
 say?
 
 G
 
  Uwe Ligges
  
  
  
   The matrix I'm using can be obtained here :
   http://129.94.136.7/file_dump/dario/iMatrix.obj
  
   --
   Dario Strbenac
   Research Assistant
   Cancer Epigenetics
   Garvan Institute of Medical Research
   Darlinghurst NSW 2010
   Australia
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 
 
 --
 Dario Strbenac
 Research Assistant
 Cancer Epigenetics
 Garvan Institute of Medical Research
 Darlinghurst NSW 2010
 Australia

Re: [R] Quasi-poisson glm and calculating a qAIC and qAICc...trying to modilfy Bolker et al. 2009 function to work for a glm model

2011-01-28 Thread Gavin Simpson
On Thu, 2011-01-27 at 08:20 -0500, Jason Nelson wrote:
 Sorry about re-posting this, it never went out to the mailing list when I
 posted this to r-help forum on Nabble and was pending for a few days, now
 that I am subscribe to the mailing list I hope that this goes out:
 
 I've been a viewer of this forum for a while and it has helped out a lot,
 but this is my first time posting something.
 
 I am running glm models for richness and abundances.  For example, my beetle
 richness is overdispersed:
 
  qcc.overdispersion.test(beetle.richness)
 
 Overdispersion test Obs.Var/Theor.Var Statistic   p-value
poisson data  2.628131  23.65318 0.0048847
 
 So, I am running a simple glm with my distribution as quasipoisson
 
  glm.richness1-glm(beetle.richness~log.area, family = quasipoisson)
 
 
 Now I want to calculate a qAIC and qAICc.  I was trying to modify the
 equation that I found in Bolker et al 2009 supplemental material:
 
 QAICc - function(mod, scale, QAICc=TRUE){
 LL - logLik(mod)

You are out of luck there then; logLik is not defined for the quasi
families.

 ll - as.numeric(LL)
 df - attr(LL, df)
 n - length(mod$y)   #used $ to replace @ to make a S3 object
 if(QAICc)
 qaic = as.numeric( -2*ll/scale + 2*df +
 2*df*(df+1)/(n-df-1))
 else qaic =as.numeric( -2*ll/scale + 2*df)
 qaic
 }
 
 The only problem is that I have no idea how to scale it.  In Bolker at al.
 2009 it is scaled to phi:
 
 phi = lme4:::sigma(model)

phi, IIRC, is the dispersion parameter. You can get the estimated value
from `summary(model)$dispersion` where model is the result of a call to
glm().

 But I am not running a mixed model and I cannot run the qAICc function
 without scaling it.  I am comparing models to each other trying to find the
 best model for both landscape land use land cover data and patch variables.
  How would I set the scale if I run this function?
 
 QAICc(glm.richness1, scale = ?)
 
 Should I set the scale to the square root of the deviance?  phi =
 sqrt(glm.richness1$deviance)
 
 Your help is much appreciated.

Instead of resorting to these ad-hoc approaches to correct for
overdispersion, you would be better off fitting models that model the
overdispersion. Try a negative binomial (glm.nb() in MASS) or the
zeroinfl() and hurdle() functions in the pscl package. Those all have
proper log likelihoods and you can compute/extract AIC simply using
them.

 Regards,
 Jason

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help needed

2011-01-28 Thread Gavin Simpson
On Thu, 2011-01-27 at 09:50 -0500, Thomas Stewart wrote:
 For question 2,
 
 TTT - rt(1000,3)
 mean(TTT[rank(TTT) = 975  rank(TTT) 25])

mean(TTT, trim = 0.05)

What you are doing is only removing 5% of observations in total, whilst
the question asks for 5% removed off *each* end.

G

 
 On Thu, Jan 27, 2011 at 8:16 AM, Ben Boyadjian benjy_cy...@hotmail.comwrote:
 
  Hello I am trying to solve these problems and I am not allowed to use loops
  or ifs.
 
  1st Question
  My first question is that I have generated 100 random numbers from the
  uniform distribution then
  A)add only the negative integers.
  B)add elements until the first appearance of a negative element.
 
  I know how to choose the negative elements for A but how to find integers?
  And I dont know what to do for B.
 
  2nd Question
   Simulate 1000 observations from the student-t distribution with 3 degrees
  of freedom and then calculate the truncated mean by excluding bottom 5% and
  top 5%.
 
  Thank yoou
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extrapolating values from a glm fit

2011-01-27 Thread Gavin Simpson
On Wed, 2011-01-26 at 19:25 -1000, Ahnate Lim wrote:
 Even when I try to predict y values from x, let's say I want to predict y at
 x=0. Looking at the graph from the provided syntax, I would expect y to be
 about 0.85. Is this right:
 
 predict(mylogit,newdata=as.data.frame(0),type=response)

Your original problem was the use of `newdata = as.data.frame(0.5)`.
There are two problems here: i) if you don't name the input (x = 0.5,
say) then you get a data frame with the name(s) 0.5:

 as.data.frame(0.5)
  0.5
1 0.5

and ii) if you do name it, you still get a data frame with name(s) 0.5

 as.data.frame(x = 0.5)
  0.5
1 0.5

In both cases, predict wants to find a variable with the name `x` in the
object supplied to `newdata`. It finds `x` but your `x` in the global
workspace, but warns because it knows that `newdata` was a data frame
with a single row - so there was a mismatch and you likely made a
mistake.

In these cases, `data.frame()` is preferred to `as.data.frame()`:

predict(mylogit, newdata = data.frame(x = 0), type = response)

or we can use a list, to save a few characters:

predict(mylogit, newdata = list(x = 0), type = response)

which give:

 predict(mylogit, newdata = list(x = 0), type = response)
   1 
0.813069 
 predict(mylogit, newdata = data.frame(x = 0), type = response)
   1 
0.813069 

In summary, use `data.frame()` or `list()` to create the object passed
as `newdata` and make sure you give the component containing the new
values a *name* that matches the predictor in the model formula.

HTH

G

 
 # I get:
 
 Warning message:
 'newdata' had 1 rows but variable(s) found have 34 rows
 
 # Why do I need 34 rows? So I try:
 
 v=vector()
 v[1:34]=0
 predict(mylogit,newdata=as.data.frame(v),type=response)
 
 # And I get a matrix of 34 values that appear to fit a logistic function,
 instead of 0.85..
 
 
 
 
 On Wed, Jan 26, 2011 at 6:59 PM, David Winsemius 
 dwinsem...@comcast.netwrote:
 
 
  On Jan 26, 2011, at 10:52 PM, Ahnate Lim wrote:
 
   Dear R-help,
 
  I have fitted a glm logistic function to dichotomous forced choices
  responses varying according to time interval between two stimulus. x
  values
  are time separation in miliseconds, and the y values are proportion
  responses for one of the stimulus. Now I am trying to extrapolate x values
  for the y value (proportion) at .25, .5, and .75. I have tried several
  predict parameters, and they don't appear to be working. Is this correct
  use
  and understanding of the predict function? It would be nice to know the
  parameters for the glm best fit, but all I really need are the
  extrapolated
  x values for those proportions. Thank you for your help. Here is the code:
 
  x -
  c(-283.9, -267.2, -250.5, -233.8, -217.1, -200.4, -183.7, -167,
  -150.3, -133.6, -116.9, -100.2, -83.5, -66.8, -50.1, -33.4, -16.7,
  16.7, 33.4, 50.1, 66.8, 83.5, 100.2, 116.9, 133.6, 150.3, 167,
  183.7, 200.4, 217.1, 233.8, 250.5, 267.2, 283.9)
 
  y -
  c(0, 0.333, 0, 0, 0, 0, 0, 0, 0, 0.333,
  0, 0.133, 0.238095238095238, 0.528,
  0.567,
  0.845238095238095, 0.55, 1, 0.889, 1, 1, 1, 1, 1,
  1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5)
 
  weight -
  c(1, 3, 2, 5, 4, 4, 3, 5, 5, 4, 5, 11, 22, 11, 15, 16, 11, 7,
  14, 10, 16, 19, 11, 5, 4, 5, 6, 9, 4, 2, 5, 5, 2, 2)
 
  mylogit - glm(y~x,weights=weight, family = binomial)
 
  # now I try plotting the predicted value, and it looks like a good fit,
  hopefully I can access what the glm is doing
 
  ypred - predict(mylogit,newdata=as.data.frame(x),type=response)
  plot(x, ypred,type=l)
  points(x,y)
 
  # so I try to predict the x value when y (proportion) is at .5, but
  something is wrong..
 
 
  Right. Predict goes in the other direction ... x predicts y.
 
  Perhaps if you created a function of y w.r.t. x and then inverted it.
 
  ?approxfun  # and see the posting earlier this week Inverse Prediction
  with splines where it was demonstrated how to reverse the roles of x and y.
 
 
  predict(mylogit,newdata=as.data.frame(0.5))
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  David Winsemius, MD
  West Hartford, CT
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e

Re: [R] Extrapolating values from a glm fit

2011-01-27 Thread Gavin Simpson
On Thu, 2011-01-27 at 00:10 -1000, Ahnate Lim wrote:
 Thank you for the clarification, this makes sense now! dose.p from
 MASS also does the job perfectly, in returning x values for specified
 proportions. I'm curious, if I use the other parameter in predict.glm
 type=link (instead of type=response), in the case of a logistic
 binomial, what does this predict?

This is the predicted value(s) on the scale of the linear predictor, or
the link function, depending on terminology, hence link.

Recall that in the GLM the response and the linear predictor are related
through a link function:

g(y) = eta

so for your model

logit(y) = eta

where eta is the linear predictor: beta_0 + beta_1 * x (in your
example).

beta_0 + beta_1 * x gives us the fitted value but on the untransformed
scale. This is the value given by predict() when type = link is used.
To get the predicted value on the response scale, we apply the inverse
of the link function g():

y = g(eta)^-1

The inverse of the logit function is the inverse-logit:

logit(eta)^-1 = exp(eta) / (1 + exp(eta))

So in R, we can see the relationship through a few simple commands.
First, the prediction for x = 0.5 on the response scale:

 predict(mylogit, newdata = list(x = 0.5), type = response)
1 
0.8149848 

Then we compute the same prediction but on the scale of the link
function:

 (p1 - predict(mylogit, newdata = list(x = 0.5), type = link))
   1 
1.482732 

to which we apply the inverse-logit function giving us the same value we
got earlier for type = response:

 exp(p1) / (1 + exp(p1))
1 
0.8149848 

There is a function to do this for us, however, called plogis()

 plogis(p1)
1 
0.8149848

One reason why you might want predictions on the scale of the link
function is for computation of confidence intervals using normal theory
(e.g. 2-sigma ~95% confidence intervals). On the response scale these
should be asymmetric and respect the scale of the response (bounding
etc), so you compute them on the link scale and apply the inverse of the
link function to get them on to the response scale.

HTH

G

 
 
 
 On Wed, Jan 26, 2011 at 11:41 PM, Gavin Simpson
 gavin.simp...@ucl.ac.uk wrote:
 On Wed, 2011-01-26 at 19:25 -1000, Ahnate Lim wrote:
  Even when I try to predict y values from x, let's say I want
 to predict y at
  x=0. Looking at the graph from the provided syntax, I would
 expect y to be
  about 0.85. Is this right:
 
  predict(mylogit,newdata=as.data.frame(0),type=response)
 
 
 Your original problem was the use of `newdata =
 as.data.frame(0.5)`.
 There are two problems here: i) if you don't name the input (x
 = 0.5,
 say) then you get a data frame with the name(s) 0.5:
 
  as.data.frame(0.5)
 
  0.5
 1 0.5
 
 and ii) if you do name it, you still get a data frame with
 name(s) 0.5
 
  as.data.frame(x = 0.5)
  0.5
 1 0.5
 
 In both cases, predict wants to find a variable with the name
 `x` in the
 object supplied to `newdata`. It finds `x` but your `x` in the
 global
 workspace, but warns because it knows that `newdata` was a
 data frame
 with a single row - so there was a mismatch and you likely
 made a
 mistake.
 
 In these cases, `data.frame()` is preferred to
 `as.data.frame()`:
 
 predict(mylogit, newdata = data.frame(x = 0), type =
 response)
 
 or we can use a list, to save a few characters:
 
 predict(mylogit, newdata = list(x = 0), type = response)
 
 which give:
 
  predict(mylogit, newdata = list(x = 0), type = response)
   1
 0.813069
  predict(mylogit, newdata = data.frame(x = 0), type =
 response)
   1
 0.813069
 
 In summary, use `data.frame()` or `list()` to create the
 object passed
 as `newdata` and make sure you give the component containing
 the new
 values a *name* that matches the predictor in the model
 formula.
 
 HTH
 
 
 G
 
 
  # I get:
 
  Warning message:
  'newdata' had 1 rows but variable(s) found have 34 rows
 
  # Why do I need 34 rows? So I try:
 
  v=vector()
  v[1:34]=0
  predict(mylogit,newdata=as.data.frame(v),type=response)
 
  # And I get a matrix of 34 values that appear to fit a
 logistic function,
  instead of 0.85..
 
 
 
 
  On Wed, Jan 26, 2011 at 6:59 PM, David Winsemius
 dwinsem...@comcast.netwrote:
 
  
   On Jan 26, 2011, at 10:52 PM

Re: [R] agnes clustering and NAs

2011-01-27 Thread Gavin Simpson
On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
 
 On 27.01.2011 05:00, Dario Strbenac wrote:
  Hello,
 
  In the documentation for agnes in the package 'cluster', it says that NAs 
  are allowed, and sure enough it works for a small example like :
 
  m- matrix(c(
  1, 1, 1, 2,
  1, NA, 1, 1,
  1, 2, 2, 2), nrow = 3, byrow = TRUE)
  agnes(m)
  Call:agnes(x = m)
  Agglomerative coefficient:  0.1614168
  Order of objects:
  [1] 1 2 3
  Height (summary):
  Min. 1st Qu.  MedianMean 3rd Qu.Max.
 1.155   1.247   1.339   1.339   1.431   1.524
 
  Available components:
  [1] order  height ac merge  diss   call   method data
 
  But I have a large matrix (23371 rows, 50 columns) with some NAs in it and 
  it runs for about a minute, then gives an error :
 
  agnes(iMatrix)
  Error in agnes(iMatrix) :
 No clustering performed, NA-values in the dissimilarity matrix.
 
  I've also tried getting rid of rows with all NAs in them, and it still gave 
  me the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the 
  claim made by its documentation.
 
 
 I haven't looked in the file, but you need to get rid of all NA, or in 
 other words, all rows that contain *any* NA values.

If one believes the documentation, then that only applies to the case
where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
data matrix or data frame.

The only way the OP could have gotten that error with the call shown is
if iMatrix were not a dissimilarity matrix inheriting from class dist,
so `NA`s should be allowed.

My guess would be that the OP didn't get rid of all the `NA`s.

Dario: what does:

sapply(iMatrix, function(x) any(is.na(x)))

or if iMatrix is a matrix:

apply(iMatrix, 2, function(x) any(is.na(x)))

say?

G

 Uwe Ligges
 
 
 
  The matrix I'm using can be obtained here :
  http://129.94.136.7/file_dump/dario/iMatrix.obj
 
  --
  Dario Strbenac
  Research Assistant
  Cancer Epigenetics
  Garvan Institute of Medical Research
  Darlinghurst NSW 2010
  Australia
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   >