Thanks Gavin, Your suggestion seems promising. I don't think I'll do the derivatives analysis at this time so no hurry on those codes. New question. I am wondering if there are additional considerations when making multiple comparasons with a gamm model. I have been testing a large number of parameters to see which of say 20 environmental variables are seasonal. I would expect a number of these variables to be seasonal but based on this conversation http://stats.stackexchange.com/questions/49052/are-splines-overfitting-the-data I am nervous about the variety of possible spline fits somehow compounding the problem of testing multiple variables for seasonality. Any thoughts? -Jacob
On Wed, Mar 26, 2014 at 2:34 PM, Gavin Simpson <ucfa...@gmail.com> wrote: > Sorry about the errors (typos, not syntax errors) - I was forgetting > that you'd need to use `gamm()` and hence access the `$gam` component > > I don't follow the point about a factor trending up or down. You > shouldn't try to use the `$lme` part of the model for this. > `summary(mod$gam)` should be sufficient, but as it relates to a > spline, this is more a test of whether the spline is different from a > horizontal, flat, null line. The problem with splines is that the > trend need not just be "trending up or down". In the past, to convey > where change in the trend occurs I have used the first derivative of > the fitted spline and looked for where in time the 95% confidence > interval on the first derivative of the spline doesn't include zero; > that shows the regions in time where the trend is significantly > increasing or decreasing. I cover how to do this in a blog post I > wrote: > > > http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/ > > the post contains links to the R code used for the derivatives etc, > though it is a little more complex in the case of a model with a trend > spline and seasonal spline. > > I'm supposed to have updated those codes and the post because several > people have asked me how I do the analysis for models with multiple > spline terms. If you can't get the code to work for your models, ping > me back and I'll try to move that to the top of my TO DO list. > > Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer > to the basis functions that represent the spline or at least to some > part of those basis functions. You can't really make much practical > use out of those values are they relate specifically to way the > penalised regression spline model has been converted into an > equivalent linear mixed effect form. > > HTH > > G > > On 26 March 2014 12:10, Jacob Cram <cramj...@gmail.com> wrote: > > Thanks again Gavin, this works. > > gamm() also models the long term trend with a spline s(Time), which is > > great. I would still like though, to be able to say whether the factor is > > trending up or down over time. Would it be fair to query > > summary(mod$lme)$tTable > > and to look at the p-value and "Value" corresponding Xs(time)Fx1 value to > > identify such a trend? > > > > Also, here are a few syntax corrections on the code provided in the last > > email: > > 1)visual appoach > > plot(mod$gam, pages = 1) > > > > 2) quantitative approach > > pred <- predict(mod$gam, newdata = newdat, type = "terms") > > take <- which(pred[,"s(DoY)"] == max(pred[,"s(DoY)"])) > > or > > take <- as.numeric(which.max(pred[,"s(DoY)"])) > > > > Cheers, > > -Jacob > > > > > > On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson <ucfa...@gmail.com> > wrote: > >> > >> 1) Visually - unless it actually matters exactly on which day in the > >> year the peak is observed? If visually is OK, just do `plot(mod, pages > >> = 1)` to see the fitted splines on a single page. See `?plot.gam` for > >> more details on the plot method. > >> > >> 2) You could generate some new data to predict upon as follows: > >> > >> newdat <- data.frame(DoY = seq_len(366), time = mean(foo$time)) > >> > >> Then predict for those new data but collect the individual > >> contributions of the spline terms to the predicted value rather than > >> just the final prediction > >> > >> pred <- predict(mod, newdata = newdat, type = "terms") > >> > >> Then find the maximal value of the DoY contribution > >> > >> take <- which(pred$DoY == max(pred$DoY)) > >> newdat[take, , drop = FALSE] > >> > >> You could use > >> > >> take <- which.max(pred$DoY) > >> > >> instead of the `which()` I used, but only if there is a single maximal > >> value. > >> > >> This works because the spline terms in the additive model are just > >> that; additive. Hence because you haven't let the DoY and time splines > >> interact (in the simple model I mentioned, it is more complex if you > >> allow these to interact as you then need to predict DoY for each years > >> worth of time points), you can separate DoY from the other terms. > >> > >> None of the above code has been tested, but was written off top of my > >> head, but should work or at least get you pretty close to something > >> that works. > >> > >> HTH > >> > >> G > >> > >> On 26 March 2014 10:02, Jacob Cram <cramj...@gmail.com> wrote: > >> > Thanks Gavin, > >> > This seems like a promising approach and a first pass suggests it > >> > works > >> > with this data. I can't quite figure out how I would go about > >> > interrogating > >> > the fitted spline to deterine when the peak value happens with respect > >> > to > >> > DoY. Any suggestions? > >> > -Jacob > >> > > >> > > >> > On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson <ucfa...@gmail.com> > >> > wrote: > >> >> > >> >> I would probably attack this using a GAM modified to model the > >> >> residuals as a stochastic time series process. > >> >> > >> >> For example > >> >> > >> >> require("mgcv") > >> >> mod <- gamm(y ~ s(DoY, bs = "cc") + s(time), data = foo, > >> >> correlation = corCAR1(form = ~ time)) > >> >> > >> >> where `foo` is your data frame, `DoY` is a variable in the data frame > >> >> computed as `as.numeric(strftime(RDate, format = "%j"))` and `time` > is > >> >> a variable for the passage of time - you could do `as.numeric(RDate)` > >> >> but the number of days is probably large as we might encounter more > >> >> problems fitting the model. Instead you might do `as.numeric(RDate) / > >> >> 1000` say to produce values on a more manageable scale. The `bs = > >> >> "cc"` bit specifies a cyclic spline applicable to data measured > >> >> throughout a year. You may want to fix the start and end knots to be > >> >> days 1 and days 366 respectively, say via `knots = list(DoY = > >> >> c(0,366))` as an argument to `gam()` [I think I have this right, > >> >> specifying the boundary knots, but let me know if you get an error > >> >> about the number of knots]. The residuals are said to follow a > >> >> continuois time AR(1), the irregular-spaced counter part to the > AR(1), > >> >> plus random noise. > >> >> > >> >> There may be identifiability issues as the `s(time)` and `corCAR1()` > >> >> compete to explain the fine-scale variation. If you hit such a case, > >> >> you can make an educated guess as to the wiggliness (degrees of > >> >> freedom) for the smooth terms based on a plot of the data and fix the > >> >> splines at those values via argument `k = x` and `fx = TRUE`, where > >> >> `x` in `k = x` is some integer value. Both these go in as arguments > to > >> >> the `s()` functions. If the trend is not very non-linear you can use > a > >> >> low value 1-3 here for x and for the DoY term say 3-4 might be > >> >> applicable. > >> >> > >> >> There are other ways to approach this problem of identifiability, but > >> >> that would require more time/space here, which I can go into via a > >> >> follow-up if needed. > >> >> > >> >> You can interrogate the fitted splines to see when the peak value of > >> >> the `DoY` term is in the year. > >> >> > >> >> You can also allow the seasonal signal to vary in time with the trend > >> >> by allowing the splines to "interact" in a 2d-tensor product spline. > >> >> Using `te(DoY, time, bs = c("cc","cr"))` instead of the two `s()` > >> >> terms (or using `ti()` terms for the two "marginal" splines and the > >> >> 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to > the > >> >> `te()` term where `x` and `y` are the dfs for each dimension in the > >> >> `te()` term. It is a bit more complex to do this for `ti()` terms. > >> >> > >> >> Part of the reason to prefer a spline for DoY for the seasonal term > is > >> >> that one might not expect the seasonal cycle to be a symmetric cycle > >> >> as a cos/sin terms would imply. > >> >> > >> >> A recent ecological paper describing a similar approach (though using > >> >> different package in R) is that of Claire Ferguson and colleagues in > J > >> >> Applied Ecology (2008) > http://doi.org/10.1111/j.1365-2664.2007.01428.x > >> >> (freely available). > >> >> > >> >> HTH > >> >> > >> >> G > >> >> > >> >> On 25 March 2014 19:14, Jacob Cram <cramj...@gmail.com> wrote: > >> >> > Hello all, > >> >> > I am thinking about applying season::cosinor() analysis to > some > >> >> > irregularely spaced time series data. The data are unevenly spaced, > >> >> > so > >> >> > usual time series methods, as well as the nscosinor() function are > >> >> > out. > >> >> > My > >> >> > data do however trend over time and I am wondering if I can feed > date > >> >> > as > >> >> > a > >> >> > variable into my cosinor analyis. In the example below, then I'd > >> >> > conclude > >> >> > then that the abundances are seasonal, with maximal abundance in > mid > >> >> > June > >> >> > and furthermore, they are generally decreasing over time. > >> >> > Can I use both time variables together like this? If not, is there > >> >> > some > >> >> > better approach I should take? > >> >> > Thanks in advance, > >> >> > -Jacob > >> >> > > >> >> > For context lAbundance is logg-odds transformed abundance data of a > >> >> > microbial species in a given location over time. RDate is the date > >> >> > the > >> >> > sample was collected in the r date format. > >> >> > > >> >> > > >> >> >> res <- cosinor(lAbundance ~ RDate, date = "RDate", data = lldata) > >> >> > > >> >> >>summary(res) > >> >> > Cosinor test > >> >> > Number of observations = 62 > >> >> > Amplitude = 0.58 > >> >> > Phase: Month = June , day = 14 > >> >> > Low point: Month = December , day = 14 > >> >> > Significant seasonality based on adjusted significance level of > 0.025 > >> >> > = > >> >> > TRUE > >> >> > > >> >> >>summary(res$glm) > >> >> > > >> >> > > >> >> > Call: > >> >> > glm(formula = f, family = family, data = data, offset = offset) > >> >> > > >> >> > Deviance Residuals: > >> >> > Min 1Q Median 3Q Max > >> >> > -2.3476 -0.6463 0.1519 0.6574 1.9618 > >> >> > > >> >> > Coefficients: > >> >> > Estimate Std. Error t value Pr(>|t|) > >> >> > (Intercept) 0.0115693 1.8963070 0.006 0.99515 > >> >> > RDate -0.0003203 0.0001393 -2.299 0.02514 * > >> >> > cosw -0.5516458 0.1837344 -3.002 0.00395 ** > >> >> > sinw 0.1762904 0.1700670 1.037 0.30423 > >> >> > --- > >> >> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > >> >> > > >> >> > (Dispersion parameter for gaussian family taken to be 0.9458339) > >> >> > > >> >> > Null deviance: 70.759 on 61 degrees of freedom > >> >> > Residual deviance: 54.858 on 58 degrees of freedom > >> >> > AIC: 178.36 > >> >> > > >> >> > Number of Fisher Scoring iterations: 2 > >> >> > > >> >> > > >> >> > llldata as a csv below > >> >> > "RDate","lAbundance" > >> >> > 2003-03-12,-3.3330699059335 > >> >> > 2003-05-21,-3.04104625745886 > >> >> > 2003-06-17,-3.04734680029566 > >> >> > 2003-07-02,-4.18791034708572 > >> >> > 2003-09-18,-3.04419201802053 > >> >> > 2003-10-22,-3.13805060873929 > >> >> > 2004-02-19,-3.80688269144794 > >> >> > 2004-03-17,-4.50755507726145 > >> >> > 2004-04-22,-4.38846502542992 > >> >> > 2004-05-19,-3.06618649442674 > >> >> > 2004-06-17,-5.20518774876304 > >> >> > 2004-07-14,-3.75041853151097 > >> >> > 2004-08-25,-3.67882486716196 > >> >> > 2004-09-22,-5.22205827512234 > >> >> > 2004-10-14,-3.99297508670535 > >> >> > 2004-11-17,-4.68793287601157 > >> >> > 2004-12-15,-4.31712380781011 > >> >> > 2005-02-16,-4.30893550479904 > >> >> > 2005-03-16,-4.05781773988454 > >> >> > 2005-05-11,-3.94746237402035 > >> >> > 2005-07-19,-4.91195185391358 > >> >> > 2005-08-17,-4.93590576323119 > >> >> > 2005-09-15,-4.85820800095518 > >> >> > 2005-10-20,-5.22956391101343 > >> >> > 2005-12-13,-5.12244047315448 > >> >> > 2006-01-18,-3.04854660925046 > >> >> > 2006-02-22,-6.77145858348375 > >> >> > 2006-03-29,-4.33151493849021 > >> >> > 2006-04-19,-3.36152357710535 > >> >> > 2006-06-20,-3.09071584142593 > >> >> > 2006-07-25,-3.31430484483825 > >> >> > 2006-08-24,-3.09974933041469 > >> >> > 2006-09-13,-3.33288992218458 > >> >> > 2007-12-17,-4.19942661980677 > >> >> > 2008-03-19,-3.86146499633625 > >> >> > 2008-04-22,-3.36161599919095 > >> >> > 2008-05-14,-4.30878307213324 > >> >> > 2008-06-18,-3.74372448768828 > >> >> > 2008-07-09,-4.65951429661651 > >> >> > 2008-08-20,-5.35984647704619 > >> >> > 2008-09-22,-4.78481898261137 > >> >> > 2008-10-20,-3.58588161980965 > >> >> > 2008-11-20,-3.10625125552057 > >> >> > 2009-02-18,-6.90675477864855 > >> >> > 2009-03-11,-3.43446932013368 > >> >> > 2009-04-23,-3.82688066341466 > >> >> > 2009-05-13,-4.44885332005661 > >> >> > 2009-06-18,-3.97671552612412 > >> >> > 2009-07-09,-3.40185954003936 > >> >> > 2009-08-19,-3.44958231694091 > >> >> > 2009-09-24,-3.86508161094726 > >> >> > 2010-01-28,-4.95281587967569 > >> >> > 2010-02-11,-3.78064756876257 > >> >> > 2010-03-24,-3.5823501176064 > >> >> > 2010-04-27,-4.33635715879999 > >> >> > 2010-05-17,-3.90545735473055 > >> >> > 2010-07-21,-3.3147176517321 > >> >> > 2010-08-11,-4.53218360860017 > >> >> > 2010-10-21,-6.90675477864855 > >> >> > 2010-11-23,-6.90675477864855 > >> >> > 2010-12-16,-6.75158176094352 > >> >> > 2011-01-11,-6.90675477864855 > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > > >> >> > _______________________________________________ > >> >> > R-sig-ecology mailing list > >> >> > R-sig-ecology@r-project.org > >> >> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > >> >> > >> >> > >> >> > >> >> -- > >> >> Gavin Simpson, PhD > >> > > >> > > >> > >> > >> > >> -- > >> Gavin Simpson, PhD > > > > > > > > -- > Gavin Simpson, PhD > [[alternative HTML version deleted]] _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology