Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-30 Thread Jacob Cram
Thanks Gavin,
Your suggestion seems promising.  I don't think I'll do the derivatives
analysis at this time so no hurry on those codes.
New question.  I am wondering if there are additional considerations when
making multiple comparasons with a gamm model.
I have been testing a large number of parameters to see which of say 20
environmental variables are seasonal. I would expect a number of these
variables to be seasonal but based on this conversation
http://stats.stackexchange.com/questions/49052/are-splines-overfitting-the-data
I am nervous about the variety of possible spline fits somehow compounding
the problem of testing multiple variables for seasonality.  Any thoughts?
-Jacob

On Wed, Mar 26, 2014 at 2:34 PM, Gavin Simpson ucfa...@gmail.com wrote:

 Sorry about the errors (typos, not syntax errors) - I was forgetting
 that you'd need to use `gamm()` and hence access the `$gam` component

 I don't follow the point about a factor trending up or down. You
 shouldn't try to use the `$lme` part of the model for this.
 `summary(mod$gam)` should be sufficient, but as it relates to a
 spline, this is more a test of whether the spline is different from a
 horizontal, flat, null line. The problem with splines is that the
 trend need not just be trending up or down. In the past, to convey
 where change in the trend occurs I have used the first derivative of
 the fitted spline and looked for where in time the 95% confidence
 interval on the first derivative of the spline doesn't include zero;
 that shows the regions in time where the trend is significantly
 increasing or decreasing. I cover how to do this in a blog post I
 wrote:


 http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/

 the post contains links to the R code used for the derivatives etc,
 though it is a little more complex in the case of a model with a trend
 spline and seasonal spline.

 I'm supposed to have updated those codes and the post because several
 people have asked me how I do the analysis for models with multiple
 spline terms. If you can't get the code to work for your models, ping
 me back and I'll try to move that to the top of my TO DO list.

 Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer
 to the basis functions that represent the spline or at least to some
 part of those basis functions. You can't really make much practical
 use out of those values are they relate specifically to way the
 penalised regression spline model has been converted into an
 equivalent linear mixed effect form.

 HTH

 G

 On 26 March 2014 12:10, Jacob Cram cramj...@gmail.com wrote:
  Thanks again Gavin, this works.
  gamm() also models the long term trend with a spline s(Time), which is
  great. I would still like though, to be able to say whether the factor is
  trending up or down over time.  Would it be fair to query
  summary(mod$lme)$tTable
  and to look at the p-value and Value corresponding Xs(time)Fx1 value to
  identify such a trend?
 
  Also, here are a few syntax corrections on the code provided in the last
  email:
  1)visual appoach
  plot(mod$gam, pages = 1)
 
  2) quantitative approach
  pred - predict(mod$gam, newdata = newdat, type = terms)
  take - which(pred[,s(DoY)] == max(pred[,s(DoY)]))
  or
  take - as.numeric(which.max(pred[,s(DoY)]))
 
  Cheers,
  -Jacob
 
 
  On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com
 wrote:
 
  1) Visually - unless it actually matters exactly on which day in the
  year the peak is observed? If visually is OK, just do `plot(mod, pages
  = 1)` to see the fitted splines on a single page. See `?plot.gam` for
  more details on the plot method.
 
  2) You could generate some new data to predict upon as follows:
 
  newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))
 
  Then predict for those new data but collect the individual
  contributions of the spline terms to the predicted value rather than
  just the final prediction
 
  pred - predict(mod, newdata = newdat, type = terms)
 
  Then find the maximal value of the DoY contribution
 
  take - which(pred$DoY == max(pred$DoY))
  newdat[take, , drop = FALSE]
 
  You could use
 
  take - which.max(pred$DoY)
 
  instead of the `which()` I used, but only if there is a single maximal
  value.
 
  This works because the spline terms in the additive model are just
  that; additive. Hence because you haven't let the DoY and time splines
  interact (in the simple model I mentioned, it is more complex if you
  allow these to interact as you then need to predict DoY for each years
  worth of time points), you can separate DoY from the other terms.
 
  None of the above code has been tested, but was written off top of my
  head, but should work or at least get you pretty close to something
  that works.
 
  HTH
 
  G
 
  On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
   Thanks Gavin,
This seems like a promising 

Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Gavin Simpson
1) Visually - unless it actually matters exactly on which day in the
year the peak is observed? If visually is OK, just do `plot(mod, pages
= 1)` to see the fitted splines on a single page. See `?plot.gam` for
more details on the plot method.

2) You could generate some new data to predict upon as follows:

newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

Then predict for those new data but collect the individual
contributions of the spline terms to the predicted value rather than
just the final prediction

pred - predict(mod, newdata = newdat, type = terms)

Then find the maximal value of the DoY contribution

take - which(pred$DoY == max(pred$DoY))
newdat[take, , drop = FALSE]

You could use

take - which.max(pred$DoY)

instead of the `which()` I used, but only if there is a single maximal value.

This works because the spline terms in the additive model are just
that; additive. Hence because you haven't let the DoY and time splines
interact (in the simple model I mentioned, it is more complex if you
allow these to interact as you then need to predict DoY for each years
worth of time points), you can separate DoY from the other terms.

None of the above code has been tested, but was written off top of my
head, but should work or at least get you pretty close to something
that works.

HTH

G

On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
 Thanks Gavin,
  This seems like a promising approach and a first pass suggests it works
 with this data. I can't quite figure out how I would go about interrogating
 the fitted spline to deterine when the peak value happens with respect to
 DoY.  Any suggestions?
 -Jacob


 On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com wrote:

 I would probably attack this using a GAM modified to model the
 residuals as a stochastic time series process.

 For example

 require(mgcv)
 mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
  correlation = corCAR1(form = ~ time))

 where `foo` is your data frame, `DoY` is a variable in the data frame
 computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
 a variable for the passage of time - you could do `as.numeric(RDate)`
 but the number of days is probably large as we might encounter more
 problems fitting the model. Instead you might do `as.numeric(RDate) /
 1000` say to produce values on a more manageable scale. The `bs =
 cc` bit specifies a cyclic spline applicable to data measured
 throughout a year. You may want to fix the start and end knots to be
 days 1 and days 366 respectively, say via `knots = list(DoY =
 c(0,366))` as an argument to `gam()` [I think I have this right,
 specifying the boundary knots, but let me know if you get an error
 about the number of knots]. The residuals are said to follow a
 continuois time AR(1), the irregular-spaced counter part to the AR(1),
 plus random noise.

 There may be identifiability issues as the `s(time)` and `corCAR1()`
 compete to explain the fine-scale variation. If you hit such a case,
 you can make an educated guess as to the wiggliness (degrees of
 freedom) for the smooth terms based on a plot of the data and fix the
 splines at those values via argument `k = x` and `fx = TRUE`, where
 `x` in `k = x` is some integer value. Both these go in as arguments to
 the `s()` functions. If the trend is not very non-linear you can use a
 low value 1-3 here for x and for the DoY term say 3-4 might be
 applicable.

 There are other ways to approach this problem of identifiability, but
 that would require more time/space here, which I can go into via a
 follow-up if needed.

 You can interrogate the fitted splines to see when the peak value of
 the `DoY` term is in the year.

 You can also allow the seasonal signal to vary in time with the trend
 by allowing the splines to interact in a 2d-tensor product spline.
 Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()`
 terms (or using `ti()` terms for the two marginal splines and the
 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
 `te()` term where `x` and `y` are the dfs for each dimension in the
 `te()` term. It is a bit more complex to do this for `ti()` terms.

 Part of the reason to prefer a spline for DoY for the seasonal term is
 that one might not expect the seasonal cycle to be a symmetric cycle
 as a cos/sin terms would imply.

 A recent ecological paper describing a similar approach (though using
 different package in R) is that of Claire Ferguson and colleagues in J
 Applied Ecology (2008) http://doi.org/10./j.1365-2664.2007.01428.x
 (freely available).

 HTH

 G

 On 25 March 2014 19:14, Jacob Cram cramj...@gmail.com wrote:
  Hello all,
   I am thinking about applying season::cosinor() analysis to some
  irregularely spaced time series data. The data are unevenly spaced, so
  usual time series methods, as well as the nscosinor() function are out.
  My
  data do however trend over time 

Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Jacob Cram
Thanks again Gavin, this works.
gamm() also models the long term trend with a spline s(Time), which is
great. I would still like though, to be able to say whether the factor is
trending up or down over time.  Would it be fair to query
summary(mod$lme)$tTable
and to look at the p-value and Value corresponding Xs(time)Fx1 value to
identify such a trend?

Also, here are a few syntax corrections on the code provided in the last
email:
1)visual appoach
plot(mod$gam, pages = 1)

2) quantitative approach
pred - predict(mod$gam, newdata = newdat, type = terms)
take - which(pred[,s(DoY)] == max(pred[,s(DoY)]))
or
take - as.numeric(which.max(pred[,s(DoY)]))

Cheers,
-Jacob


On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote:

 1) Visually - unless it actually matters exactly on which day in the
 year the peak is observed? If visually is OK, just do `plot(mod, pages
 = 1)` to see the fitted splines on a single page. See `?plot.gam` for
 more details on the plot method.

 2) You could generate some new data to predict upon as follows:

 newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

 Then predict for those new data but collect the individual
 contributions of the spline terms to the predicted value rather than
 just the final prediction

 pred - predict(mod, newdata = newdat, type = terms)

 Then find the maximal value of the DoY contribution

 take - which(pred$DoY == max(pred$DoY))
 newdat[take, , drop = FALSE]

 You could use

 take - which.max(pred$DoY)

 instead of the `which()` I used, but only if there is a single maximal
 value.

 This works because the spline terms in the additive model are just
 that; additive. Hence because you haven't let the DoY and time splines
 interact (in the simple model I mentioned, it is more complex if you
 allow these to interact as you then need to predict DoY for each years
 worth of time points), you can separate DoY from the other terms.

 None of the above code has been tested, but was written off top of my
 head, but should work or at least get you pretty close to something
 that works.

 HTH

 G

 On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
  Thanks Gavin,
   This seems like a promising approach and a first pass suggests it
 works
  with this data. I can't quite figure out how I would go about
 interrogating
  the fitted spline to deterine when the peak value happens with respect to
  DoY.  Any suggestions?
  -Jacob
 
 
  On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com
 wrote:
 
  I would probably attack this using a GAM modified to model the
  residuals as a stochastic time series process.
 
  For example
 
  require(mgcv)
  mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
   correlation = corCAR1(form = ~ time))
 
  where `foo` is your data frame, `DoY` is a variable in the data frame
  computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
  a variable for the passage of time - you could do `as.numeric(RDate)`
  but the number of days is probably large as we might encounter more
  problems fitting the model. Instead you might do `as.numeric(RDate) /
  1000` say to produce values on a more manageable scale. The `bs =
  cc` bit specifies a cyclic spline applicable to data measured
  throughout a year. You may want to fix the start and end knots to be
  days 1 and days 366 respectively, say via `knots = list(DoY =
  c(0,366))` as an argument to `gam()` [I think I have this right,
  specifying the boundary knots, but let me know if you get an error
  about the number of knots]. The residuals are said to follow a
  continuois time AR(1), the irregular-spaced counter part to the AR(1),
  plus random noise.
 
  There may be identifiability issues as the `s(time)` and `corCAR1()`
  compete to explain the fine-scale variation. If you hit such a case,
  you can make an educated guess as to the wiggliness (degrees of
  freedom) for the smooth terms based on a plot of the data and fix the
  splines at those values via argument `k = x` and `fx = TRUE`, where
  `x` in `k = x` is some integer value. Both these go in as arguments to
  the `s()` functions. If the trend is not very non-linear you can use a
  low value 1-3 here for x and for the DoY term say 3-4 might be
  applicable.
 
  There are other ways to approach this problem of identifiability, but
  that would require more time/space here, which I can go into via a
  follow-up if needed.
 
  You can interrogate the fitted splines to see when the peak value of
  the `DoY` term is in the year.
 
  You can also allow the seasonal signal to vary in time with the trend
  by allowing the splines to interact in a 2d-tensor product spline.
  Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()`
  terms (or using `ti()` terms for the two marginal splines and the
  2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
  `te()` term where `x` and `y` are the dfs for each 

Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Gavin Simpson
Sorry about the errors (typos, not syntax errors) - I was forgetting
that you'd need to use `gamm()` and hence access the `$gam` component

I don't follow the point about a factor trending up or down. You
shouldn't try to use the `$lme` part of the model for this.
`summary(mod$gam)` should be sufficient, but as it relates to a
spline, this is more a test of whether the spline is different from a
horizontal, flat, null line. The problem with splines is that the
trend need not just be trending up or down. In the past, to convey
where change in the trend occurs I have used the first derivative of
the fitted spline and looked for where in time the 95% confidence
interval on the first derivative of the spline doesn't include zero;
that shows the regions in time where the trend is significantly
increasing or decreasing. I cover how to do this in a blog post I
wrote:

http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/

the post contains links to the R code used for the derivatives etc,
though it is a little more complex in the case of a model with a trend
spline and seasonal spline.

I'm supposed to have updated those codes and the post because several
people have asked me how I do the analysis for models with multiple
spline terms. If you can't get the code to work for your models, ping
me back and I'll try to move that to the top of my TO DO list.

Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer
to the basis functions that represent the spline or at least to some
part of those basis functions. You can't really make much practical
use out of those values are they relate specifically to way the
penalised regression spline model has been converted into an
equivalent linear mixed effect form.

HTH

G

On 26 March 2014 12:10, Jacob Cram cramj...@gmail.com wrote:
 Thanks again Gavin, this works.
 gamm() also models the long term trend with a spline s(Time), which is
 great. I would still like though, to be able to say whether the factor is
 trending up or down over time.  Would it be fair to query
 summary(mod$lme)$tTable
 and to look at the p-value and Value corresponding Xs(time)Fx1 value to
 identify such a trend?

 Also, here are a few syntax corrections on the code provided in the last
 email:
 1)visual appoach
 plot(mod$gam, pages = 1)

 2) quantitative approach
 pred - predict(mod$gam, newdata = newdat, type = terms)
 take - which(pred[,s(DoY)] == max(pred[,s(DoY)]))
 or
 take - as.numeric(which.max(pred[,s(DoY)]))

 Cheers,
 -Jacob


 On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote:

 1) Visually - unless it actually matters exactly on which day in the
 year the peak is observed? If visually is OK, just do `plot(mod, pages
 = 1)` to see the fitted splines on a single page. See `?plot.gam` for
 more details on the plot method.

 2) You could generate some new data to predict upon as follows:

 newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

 Then predict for those new data but collect the individual
 contributions of the spline terms to the predicted value rather than
 just the final prediction

 pred - predict(mod, newdata = newdat, type = terms)

 Then find the maximal value of the DoY contribution

 take - which(pred$DoY == max(pred$DoY))
 newdat[take, , drop = FALSE]

 You could use

 take - which.max(pred$DoY)

 instead of the `which()` I used, but only if there is a single maximal
 value.

 This works because the spline terms in the additive model are just
 that; additive. Hence because you haven't let the DoY and time splines
 interact (in the simple model I mentioned, it is more complex if you
 allow these to interact as you then need to predict DoY for each years
 worth of time points), you can separate DoY from the other terms.

 None of the above code has been tested, but was written off top of my
 head, but should work or at least get you pretty close to something
 that works.

 HTH

 G

 On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
  Thanks Gavin,
   This seems like a promising approach and a first pass suggests it
  works
  with this data. I can't quite figure out how I would go about
  interrogating
  the fitted spline to deterine when the peak value happens with respect
  to
  DoY.  Any suggestions?
  -Jacob
 
 
  On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com
  wrote:
 
  I would probably attack this using a GAM modified to model the
  residuals as a stochastic time series process.
 
  For example
 
  require(mgcv)
  mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
   correlation = corCAR1(form = ~ time))
 
  where `foo` is your data frame, `DoY` is a variable in the data frame
  computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
  a variable for the passage of time - you could do `as.numeric(RDate)`
  but the number of days is probably large as we might encounter 

[R-sig-eco] Cosinor with data that trend over time

2014-03-25 Thread Jacob Cram
Hello all,
 I am thinking about applying season::cosinor() analysis to some
irregularely spaced time series data. The data are unevenly spaced, so
usual time series methods, as well as the nscosinor() function are out. My
data do however trend over time and I am wondering if I can feed date as a
variable into my cosinor analyis.  In the example below, then I'd conclude
then that the abundances are seasonal, with maximal abundance in mid June
and furthermore, they are generally decreasing over time.
Can I use both time variables together like this? If not, is there some
better approach I should take?
Thanks in advance,
-Jacob

For context lAbundance is logg-odds transformed abundance data of a
microbial species in a given location over time.  RDate is the date the
sample was collected in the r date format.


 res - cosinor(lAbundance ~ RDate, date = RDate, data = lldata)

summary(res)
Cosinor test
Number of observations = 62
Amplitude = 0.58
Phase: Month = June , day = 14
Low point: Month = December , day = 14
Significant seasonality based on adjusted significance level of 0.025  =
TRUE

summary(res$glm)


Call:
glm(formula = f, family = family, data = data, offset = offset)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-2.3476  -0.6463   0.1519   0.6574   1.9618

Coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)  0.0115693  1.8963070   0.006  0.99515
RDate   -0.0003203  0.0001393  -2.299  0.02514 *
cosw-0.5516458  0.1837344  -3.002  0.00395 **
sinw 0.1762904  0.1700670   1.037  0.30423
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9458339)

Null deviance: 70.759  on 61  degrees of freedom
Residual deviance: 54.858  on 58  degrees of freedom
AIC: 178.36

Number of Fisher Scoring iterations: 2


llldata as a csv below
RDate,lAbundance
2003-03-12,-3.3330699059335
2003-05-21,-3.04104625745886
2003-06-17,-3.04734680029566
2003-07-02,-4.18791034708572
2003-09-18,-3.04419201802053
2003-10-22,-3.13805060873929
2004-02-19,-3.80688269144794
2004-03-17,-4.50755507726145
2004-04-22,-4.38846502542992
2004-05-19,-3.06618649442674
2004-06-17,-5.20518774876304
2004-07-14,-3.75041853151097
2004-08-25,-3.67882486716196
2004-09-22,-5.22205827512234
2004-10-14,-3.99297508670535
2004-11-17,-4.68793287601157
2004-12-15,-4.31712380781011
2005-02-16,-4.30893550479904
2005-03-16,-4.05781773988454
2005-05-11,-3.94746237402035
2005-07-19,-4.91195185391358
2005-08-17,-4.93590576323119
2005-09-15,-4.85820800095518
2005-10-20,-5.22956391101343
2005-12-13,-5.12244047315448
2006-01-18,-3.04854660925046
2006-02-22,-6.77145858348375
2006-03-29,-4.33151493849021
2006-04-19,-3.36152357710535
2006-06-20,-3.09071584142593
2006-07-25,-3.31430484483825
2006-08-24,-3.09974933041469
2006-09-13,-3.33288992218458
2007-12-17,-4.19942661980677
2008-03-19,-3.86146499633625
2008-04-22,-3.36161599919095
2008-05-14,-4.30878307213324
2008-06-18,-3.74372448768828
2008-07-09,-4.65951429661651
2008-08-20,-5.35984647704619
2008-09-22,-4.78481898261137
2008-10-20,-3.58588161980965
2008-11-20,-3.10625125552057
2009-02-18,-6.90675477864855
2009-03-11,-3.43446932013368
2009-04-23,-3.82688066341466
2009-05-13,-4.44885332005661
2009-06-18,-3.97671552612412
2009-07-09,-3.40185954003936
2009-08-19,-3.44958231694091
2009-09-24,-3.86508161094726
2010-01-28,-4.95281587967569
2010-02-11,-3.78064756876257
2010-03-24,-3.5823501176064
2010-04-27,-4.3363571587
2010-05-17,-3.90545735473055
2010-07-21,-3.3147176517321
2010-08-11,-4.53218360860017
2010-10-21,-6.90675477864855
2010-11-23,-6.90675477864855
2010-12-16,-6.75158176094352
2011-01-11,-6.90675477864855

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-25 Thread Gavin Simpson
I would probably attack this using a GAM modified to model the
residuals as a stochastic time series process.

For example

require(mgcv)
mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
 correlation = corCAR1(form = ~ time))

where `foo` is your data frame, `DoY` is a variable in the data frame
computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
a variable for the passage of time - you could do `as.numeric(RDate)`
but the number of days is probably large as we might encounter more
problems fitting the model. Instead you might do `as.numeric(RDate) /
1000` say to produce values on a more manageable scale. The `bs =
cc` bit specifies a cyclic spline applicable to data measured
throughout a year. You may want to fix the start and end knots to be
days 1 and days 366 respectively, say via `knots = list(DoY =
c(0,366))` as an argument to `gam()` [I think I have this right,
specifying the boundary knots, but let me know if you get an error
about the number of knots]. The residuals are said to follow a
continuois time AR(1), the irregular-spaced counter part to the AR(1),
plus random noise.

There may be identifiability issues as the `s(time)` and `corCAR1()`
compete to explain the fine-scale variation. If you hit such a case,
you can make an educated guess as to the wiggliness (degrees of
freedom) for the smooth terms based on a plot of the data and fix the
splines at those values via argument `k = x` and `fx = TRUE`, where
`x` in `k = x` is some integer value. Both these go in as arguments to
the `s()` functions. If the trend is not very non-linear you can use a
low value 1-3 here for x and for the DoY term say 3-4 might be
applicable.

There are other ways to approach this problem of identifiability, but
that would require more time/space here, which I can go into via a
follow-up if needed.

You can interrogate the fitted splines to see when the peak value of
the `DoY` term is in the year.

You can also allow the seasonal signal to vary in time with the trend
by allowing the splines to interact in a 2d-tensor product spline.
Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()`
terms (or using `ti()` terms for the two marginal splines and the
2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
`te()` term where `x` and `y` are the dfs for each dimension in the
`te()` term. It is a bit more complex to do this for `ti()` terms.

Part of the reason to prefer a spline for DoY for the seasonal term is
that one might not expect the seasonal cycle to be a symmetric cycle
as a cos/sin terms would imply.

A recent ecological paper describing a similar approach (though using
different package in R) is that of Claire Ferguson and colleagues in J
Applied Ecology (2008) http://doi.org/10./j.1365-2664.2007.01428.x
(freely available).

HTH

G

On 25 March 2014 19:14, Jacob Cram cramj...@gmail.com wrote:
 Hello all,
  I am thinking about applying season::cosinor() analysis to some
 irregularely spaced time series data. The data are unevenly spaced, so
 usual time series methods, as well as the nscosinor() function are out. My
 data do however trend over time and I am wondering if I can feed date as a
 variable into my cosinor analyis.  In the example below, then I'd conclude
 then that the abundances are seasonal, with maximal abundance in mid June
 and furthermore, they are generally decreasing over time.
 Can I use both time variables together like this? If not, is there some
 better approach I should take?
 Thanks in advance,
 -Jacob

 For context lAbundance is logg-odds transformed abundance data of a
 microbial species in a given location over time.  RDate is the date the
 sample was collected in the r date format.


 res - cosinor(lAbundance ~ RDate, date = RDate, data = lldata)

summary(res)
 Cosinor test
 Number of observations = 62
 Amplitude = 0.58
 Phase: Month = June , day = 14
 Low point: Month = December , day = 14
 Significant seasonality based on adjusted significance level of 0.025  =
 TRUE

summary(res$glm)


 Call:
 glm(formula = f, family = family, data = data, offset = offset)

 Deviance Residuals:
 Min   1Q   Median   3Q  Max
 -2.3476  -0.6463   0.1519   0.6574   1.9618

 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.0115693  1.8963070   0.006  0.99515
 RDate   -0.0003203  0.0001393  -2.299  0.02514 *
 cosw-0.5516458  0.1837344  -3.002  0.00395 **
 sinw 0.1762904  0.1700670   1.037  0.30423
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 (Dispersion parameter for gaussian family taken to be 0.9458339)

 Null deviance: 70.759  on 61  degrees of freedom
 Residual deviance: 54.858  on 58  degrees of freedom
 AIC: 178.36

 Number of Fisher Scoring iterations: 2


 llldata as a csv below
 RDate,lAbundance
 2003-03-12,-3.3330699059335
 2003-05-21,-3.04104625745886
 2003-06-17,-3.04734680029566
 2003-07-02,-4.18791034708572