Thanks Gavin,
    Your suggestion seems promising.  I don't think I'll do the derivatives
analysis at this time so no hurry on those codes.
New question.  I am wondering if there are additional considerations when
making multiple comparasons with a gamm model.
I have been testing a large number of parameters to see which of say 20
environmental variables are seasonal. I would expect a number of these
variables to be seasonal but based on this conversation
http://stats.stackexchange.com/questions/49052/are-splines-overfitting-the-data
I am nervous about the variety of possible spline fits somehow compounding
the problem of testing multiple variables for seasonality.  Any thoughts?
-Jacob

On Wed, Mar 26, 2014 at 2:34 PM, Gavin Simpson <ucfa...@gmail.com> wrote:

> Sorry about the errors (typos, not syntax errors) - I was forgetting
> that you'd need to use `gamm()` and hence access the `$gam` component
>
> I don't follow the point about a factor trending up or down. You
> shouldn't try to use the `$lme` part of the model for this.
> `summary(mod$gam)` should be sufficient, but as it relates to a
> spline, this is more a test of whether the spline is different from a
> horizontal, flat, null line. The problem with splines is that the
> trend need not just be "trending up or down". In the past, to convey
> where change in the trend occurs I have used the first derivative of
> the fitted spline and looked for where in time the 95% confidence
> interval on the first derivative of the spline doesn't include zero;
> that shows the regions in time where the trend is significantly
> increasing or decreasing. I cover how to do this in a blog post I
> wrote:
>
>
> http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/
>
> the post contains links to the R code used for the derivatives etc,
> though it is a little more complex in the case of a model with a trend
> spline and seasonal spline.
>
> I'm supposed to have updated those codes and the post because several
> people have asked me how I do the analysis for models with multiple
> spline terms. If you can't get the code to work for your models, ping
> me back and I'll try to move that to the top of my TO DO list.
>
> Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer
> to the basis functions that represent the spline or at least to some
> part of those basis functions. You can't really make much practical
> use out of those values are they relate specifically to way the
> penalised regression spline model has been converted into an
> equivalent linear mixed effect form.
>
> HTH
>
> G
>
> On 26 March 2014 12:10, Jacob Cram <cramj...@gmail.com> wrote:
> > Thanks again Gavin, this works.
> > gamm() also models the long term trend with a spline s(Time), which is
> > great. I would still like though, to be able to say whether the factor is
> > trending up or down over time.  Would it be fair to query
> > summary(mod$lme)$tTable
> > and to look at the p-value and "Value" corresponding Xs(time)Fx1 value to
> > identify such a trend?
> >
> > Also, here are a few syntax corrections on the code provided in the last
> > email:
> > 1)visual appoach
> > plot(mod$gam, pages = 1)
> >
> > 2) quantitative approach
> > pred <- predict(mod$gam, newdata = newdat, type = "terms")
> > take <- which(pred[,"s(DoY)"] == max(pred[,"s(DoY)"]))
> > or
> > take <- as.numeric(which.max(pred[,"s(DoY)"]))
> >
> > Cheers,
> > -Jacob
> >
> >
> > On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson <ucfa...@gmail.com>
> wrote:
> >>
> >> 1) Visually - unless it actually matters exactly on which day in the
> >> year the peak is observed? If visually is OK, just do `plot(mod, pages
> >> = 1)` to see the fitted splines on a single page. See `?plot.gam` for
> >> more details on the plot method.
> >>
> >> 2) You could generate some new data to predict upon as follows:
> >>
> >>     newdat <- data.frame(DoY = seq_len(366), time = mean(foo$time))
> >>
> >> Then predict for those new data but collect the individual
> >> contributions of the spline terms to the predicted value rather than
> >> just the final prediction
> >>
> >>     pred <- predict(mod, newdata = newdat, type = "terms")
> >>
> >> Then find the maximal value of the DoY contribution
> >>
> >>     take <- which(pred$DoY == max(pred$DoY))
> >>     newdat[take, , drop = FALSE]
> >>
> >> You could use
> >>
> >>     take <- which.max(pred$DoY)
> >>
> >> instead of the `which()` I used, but only if there is a single maximal
> >> value.
> >>
> >> This works because the spline terms in the additive model are just
> >> that; additive. Hence because you haven't let the DoY and time splines
> >> interact (in the simple model I mentioned, it is more complex if you
> >> allow these to interact as you then need to predict DoY for each years
> >> worth of time points), you can separate DoY from the other terms.
> >>
> >> None of the above code has been tested, but was written off top of my
> >> head, but should work or at least get you pretty close to something
> >> that works.
> >>
> >> HTH
> >>
> >> G
> >>
> >> On 26 March 2014 10:02, Jacob Cram <cramj...@gmail.com> wrote:
> >> > Thanks Gavin,
> >> >      This seems like a promising approach and a first pass suggests it
> >> > works
> >> > with this data. I can't quite figure out how I would go about
> >> > interrogating
> >> > the fitted spline to deterine when the peak value happens with respect
> >> > to
> >> > DoY.  Any suggestions?
> >> > -Jacob
> >> >
> >> >
> >> > On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson <ucfa...@gmail.com>
> >> > wrote:
> >> >>
> >> >> I would probably attack this using a GAM modified to model the
> >> >> residuals as a stochastic time series process.
> >> >>
> >> >> For example
> >> >>
> >> >> require("mgcv")
> >> >> mod <- gamm(y ~ s(DoY, bs = "cc") + s(time), data = foo,
> >> >>                          correlation = corCAR1(form = ~ time))
> >> >>
> >> >> where `foo` is your data frame, `DoY` is a variable in the data frame
> >> >> computed as `as.numeric(strftime(RDate, format = "%j"))` and `time`
> is
> >> >> a variable for the passage of time - you could do `as.numeric(RDate)`
> >> >> but the number of days is probably large as we might encounter more
> >> >> problems fitting the model. Instead you might do `as.numeric(RDate) /
> >> >> 1000` say to produce values on a more manageable scale. The `bs =
> >> >> "cc"` bit specifies a cyclic spline applicable to data measured
> >> >> throughout a year. You may want to fix the start and end knots to be
> >> >> days 1 and days 366 respectively, say via `knots = list(DoY =
> >> >> c(0,366))` as an argument to `gam()` [I think I have this right,
> >> >> specifying the boundary knots, but let me know if you get an error
> >> >> about the number of knots]. The residuals are said to follow a
> >> >> continuois time AR(1), the irregular-spaced counter part to the
> AR(1),
> >> >> plus random noise.
> >> >>
> >> >> There may be identifiability issues as the `s(time)` and `corCAR1()`
> >> >> compete to explain the fine-scale variation. If you hit such a case,
> >> >> you can make an educated guess as to the wiggliness (degrees of
> >> >> freedom) for the smooth terms based on a plot of the data and fix the
> >> >> splines at those values via argument `k = x` and `fx = TRUE`, where
> >> >> `x` in `k = x` is some integer value. Both these go in as arguments
> to
> >> >> the `s()` functions. If the trend is not very non-linear you can use
> a
> >> >> low value 1-3 here for x and for the DoY term say 3-4 might be
> >> >> applicable.
> >> >>
> >> >> There are other ways to approach this problem of identifiability, but
> >> >> that would require more time/space here, which I can go into via a
> >> >> follow-up if needed.
> >> >>
> >> >> You can interrogate the fitted splines to see when the peak value of
> >> >> the `DoY` term is in the year.
> >> >>
> >> >> You can also allow the seasonal signal to vary in time with the trend
> >> >> by allowing the splines to "interact" in a 2d-tensor product spline.
> >> >> Using `te(DoY, time, bs = c("cc","cr"))` instead of the two `s()`
> >> >> terms (or using `ti()` terms for the two "marginal" splines and the
> >> >> 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to
> the
> >> >> `te()` term where `x` and `y` are the dfs for each dimension in the
> >> >> `te()` term. It is a bit more complex to do this for `ti()` terms.
> >> >>
> >> >> Part of the reason to prefer a spline for DoY for the seasonal term
> is
> >> >> that one might not expect the seasonal cycle to be a symmetric cycle
> >> >> as a cos/sin terms would imply.
> >> >>
> >> >> A recent ecological paper describing a similar approach (though using
> >> >> different package in R) is that of Claire Ferguson and colleagues in
> J
> >> >> Applied Ecology (2008)
> http://doi.org/10.1111/j.1365-2664.2007.01428.x
> >> >> (freely available).
> >> >>
> >> >> HTH
> >> >>
> >> >> G
> >> >>
> >> >> On 25 March 2014 19:14, Jacob Cram <cramj...@gmail.com> wrote:
> >> >> > Hello all,
> >> >> >      I am thinking about applying season::cosinor() analysis to
> some
> >> >> > irregularely spaced time series data. The data are unevenly spaced,
> >> >> > so
> >> >> > usual time series methods, as well as the nscosinor() function are
> >> >> > out.
> >> >> > My
> >> >> > data do however trend over time and I am wondering if I can feed
> date
> >> >> > as
> >> >> > a
> >> >> > variable into my cosinor analyis.  In the example below, then I'd
> >> >> > conclude
> >> >> > then that the abundances are seasonal, with maximal abundance in
> mid
> >> >> > June
> >> >> > and furthermore, they are generally decreasing over time.
> >> >> > Can I use both time variables together like this? If not, is there
> >> >> > some
> >> >> > better approach I should take?
> >> >> > Thanks in advance,
> >> >> > -Jacob
> >> >> >
> >> >> > For context lAbundance is logg-odds transformed abundance data of a
> >> >> > microbial species in a given location over time.  RDate is the date
> >> >> > the
> >> >> > sample was collected in the r date format.
> >> >> >
> >> >> >
> >> >> >> res <- cosinor(lAbundance ~ RDate, date = "RDate", data = lldata)
> >> >> >
> >> >> >>summary(res)
> >> >> > Cosinor test
> >> >> > Number of observations = 62
> >> >> > Amplitude = 0.58
> >> >> > Phase: Month = June , day = 14
> >> >> > Low point: Month = December , day = 14
> >> >> > Significant seasonality based on adjusted significance level of
> 0.025
> >> >> > =
> >> >> > TRUE
> >> >> >
> >> >> >>summary(res$glm)
> >> >> >
> >> >> >
> >> >> > Call:
> >> >> > glm(formula = f, family = family, data = data, offset = offset)
> >> >> >
> >> >> > Deviance Residuals:
> >> >> >     Min       1Q   Median       3Q      Max
> >> >> > -2.3476  -0.6463   0.1519   0.6574   1.9618
> >> >> >
> >> >> > Coefficients:
> >> >> >               Estimate Std. Error t value Pr(>|t|)
> >> >> > (Intercept)  0.0115693  1.8963070   0.006  0.99515
> >> >> > RDate       -0.0003203  0.0001393  -2.299  0.02514 *
> >> >> > cosw        -0.5516458  0.1837344  -3.002  0.00395 **
> >> >> > sinw         0.1762904  0.1700670   1.037  0.30423
> >> >> > ---
> >> >> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >> >> >
> >> >> > (Dispersion parameter for gaussian family taken to be 0.9458339)
> >> >> >
> >> >> >     Null deviance: 70.759  on 61  degrees of freedom
> >> >> > Residual deviance: 54.858  on 58  degrees of freedom
> >> >> > AIC: 178.36
> >> >> >
> >> >> > Number of Fisher Scoring iterations: 2
> >> >> >
> >> >> >
> >> >> > llldata as a csv below
> >> >> > "RDate","lAbundance"
> >> >> > 2003-03-12,-3.3330699059335
> >> >> > 2003-05-21,-3.04104625745886
> >> >> > 2003-06-17,-3.04734680029566
> >> >> > 2003-07-02,-4.18791034708572
> >> >> > 2003-09-18,-3.04419201802053
> >> >> > 2003-10-22,-3.13805060873929
> >> >> > 2004-02-19,-3.80688269144794
> >> >> > 2004-03-17,-4.50755507726145
> >> >> > 2004-04-22,-4.38846502542992
> >> >> > 2004-05-19,-3.06618649442674
> >> >> > 2004-06-17,-5.20518774876304
> >> >> > 2004-07-14,-3.75041853151097
> >> >> > 2004-08-25,-3.67882486716196
> >> >> > 2004-09-22,-5.22205827512234
> >> >> > 2004-10-14,-3.99297508670535
> >> >> > 2004-11-17,-4.68793287601157
> >> >> > 2004-12-15,-4.31712380781011
> >> >> > 2005-02-16,-4.30893550479904
> >> >> > 2005-03-16,-4.05781773988454
> >> >> > 2005-05-11,-3.94746237402035
> >> >> > 2005-07-19,-4.91195185391358
> >> >> > 2005-08-17,-4.93590576323119
> >> >> > 2005-09-15,-4.85820800095518
> >> >> > 2005-10-20,-5.22956391101343
> >> >> > 2005-12-13,-5.12244047315448
> >> >> > 2006-01-18,-3.04854660925046
> >> >> > 2006-02-22,-6.77145858348375
> >> >> > 2006-03-29,-4.33151493849021
> >> >> > 2006-04-19,-3.36152357710535
> >> >> > 2006-06-20,-3.09071584142593
> >> >> > 2006-07-25,-3.31430484483825
> >> >> > 2006-08-24,-3.09974933041469
> >> >> > 2006-09-13,-3.33288992218458
> >> >> > 2007-12-17,-4.19942661980677
> >> >> > 2008-03-19,-3.86146499633625
> >> >> > 2008-04-22,-3.36161599919095
> >> >> > 2008-05-14,-4.30878307213324
> >> >> > 2008-06-18,-3.74372448768828
> >> >> > 2008-07-09,-4.65951429661651
> >> >> > 2008-08-20,-5.35984647704619
> >> >> > 2008-09-22,-4.78481898261137
> >> >> > 2008-10-20,-3.58588161980965
> >> >> > 2008-11-20,-3.10625125552057
> >> >> > 2009-02-18,-6.90675477864855
> >> >> > 2009-03-11,-3.43446932013368
> >> >> > 2009-04-23,-3.82688066341466
> >> >> > 2009-05-13,-4.44885332005661
> >> >> > 2009-06-18,-3.97671552612412
> >> >> > 2009-07-09,-3.40185954003936
> >> >> > 2009-08-19,-3.44958231694091
> >> >> > 2009-09-24,-3.86508161094726
> >> >> > 2010-01-28,-4.95281587967569
> >> >> > 2010-02-11,-3.78064756876257
> >> >> > 2010-03-24,-3.5823501176064
> >> >> > 2010-04-27,-4.33635715879999
> >> >> > 2010-05-17,-3.90545735473055
> >> >> > 2010-07-21,-3.3147176517321
> >> >> > 2010-08-11,-4.53218360860017
> >> >> > 2010-10-21,-6.90675477864855
> >> >> > 2010-11-23,-6.90675477864855
> >> >> > 2010-12-16,-6.75158176094352
> >> >> > 2011-01-11,-6.90675477864855
> >> >> >
> >> >> >         [[alternative HTML version deleted]]
> >> >> >
> >> >> > _______________________________________________
> >> >> > R-sig-ecology mailing list
> >> >> > R-sig-ecology@r-project.org
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Gavin Simpson, PhD
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Gavin Simpson, PhD
> >
> >
>
>
>
> --
> Gavin Simpson, PhD
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to